Skip to content

Commit 4d860db

Browse files
msfussellyaron2hhunter-ms
authored
Update to observability docs for OTEL (dapr#2876)
* otel doc Signed-off-by: msfussell <[email protected]> * Update daprdocs/content/en/developing-applications/building-blocks/observability/tracing-overview.md Co-authored-by: Hannah Hunter <[email protected]> Signed-off-by: Mark Fussell <[email protected]> * Update daprdocs/content/en/developing-applications/building-blocks/observability/tracing-overview.md Co-authored-by: Hannah Hunter <[email protected]> Signed-off-by: Mark Fussell <[email protected]> * Update daprdocs/content/en/developing-applications/building-blocks/observability/tracing-overview.md Co-authored-by: Hannah Hunter <[email protected]> Signed-off-by: Mark Fussell <[email protected]> * Update daprdocs/content/en/developing-applications/building-blocks/observability/tracing-overview.md Co-authored-by: Hannah Hunter <[email protected]> Signed-off-by: Mark Fussell <[email protected]> * Update daprdocs/content/en/developing-applications/building-blocks/observability/tracing-overview.md Co-authored-by: Hannah Hunter <[email protected]> Signed-off-by: Mark Fussell <[email protected]> * Update daprdocs/content/en/operations/monitoring/metrics/metrics-overview.md Co-authored-by: Hannah Hunter <[email protected]> Signed-off-by: Mark Fussell <[email protected]> * Update daprdocs/content/en/operations/monitoring/tracing/otel-collector/_index.md Co-authored-by: Hannah Hunter <[email protected]> Signed-off-by: Mark Fussell <[email protected]> * Update daprdocs/content/en/operations/monitoring/tracing/otel-collector/open-telemetry-collector.md Co-authored-by: Hannah Hunter <[email protected]> Signed-off-by: Mark Fussell <[email protected]> * Update daprdocs/content/en/operations/monitoring/tracing/otel-collector/open-telemetry-collector.md Co-authored-by: Hannah Hunter <[email protected]> Signed-off-by: Mark Fussell <[email protected]> * Update daprdocs/content/en/operations/monitoring/tracing/setup-tracing.md Co-authored-by: Yaron Schneider <[email protected]> Signed-off-by: Mark Fussell <[email protected]> * Fixed URL address * Update daprdocs/content/en/developing-applications/building-blocks/observability/tracing-overview.md Co-authored-by: Hannah Hunter <[email protected]> Signed-off-by: Mark Fussell <[email protected]> * Update daprdocs/content/en/developing-applications/building-blocks/observability/tracing-overview.md Co-authored-by: Hannah Hunter <[email protected]> Signed-off-by: Mark Fussell <[email protected]> * Update daprdocs/content/en/developing-applications/building-blocks/observability/tracing-overview.md Co-authored-by: Hannah Hunter <[email protected]> Signed-off-by: Mark Fussell <[email protected]> * Update daprdocs/content/en/developing-applications/building-blocks/observability/tracing-overview.md Co-authored-by: Hannah Hunter <[email protected]> Signed-off-by: Mark Fussell <[email protected]> * Update daprdocs/content/en/developing-applications/building-blocks/observability/tracing-overview.md Co-authored-by: Hannah Hunter <[email protected]> Signed-off-by: Mark Fussell <[email protected]> * Update daprdocs/content/en/developing-applications/building-blocks/observability/tracing-overview.md Co-authored-by: Hannah Hunter <[email protected]> Signed-off-by: Mark Fussell <[email protected]> * Apply suggestions from code review Co-authored-by: Hannah Hunter <[email protected]> Signed-off-by: Mark Fussell <[email protected]> * Update daprdocs/content/en/operations/monitoring/metrics/metrics-overview.md Co-authored-by: Hannah Hunter <[email protected]> Signed-off-by: Mark Fussell <[email protected]> Signed-off-by: msfussell <[email protected]> Signed-off-by: Mark Fussell <[email protected]> Co-authored-by: Yaron Schneider <[email protected]> Co-authored-by: Hannah Hunter <[email protected]>
1 parent 8f08e68 commit 4d860db

29 files changed

+219
-598
lines changed

.gitignore

+8-1
Original file line numberDiff line numberDiff line change
@@ -6,4 +6,11 @@ daprdocs/public
66
daprdocs/resources/_gen
77
.venv/
88
.hugo_build.lock
9-
.dccache
9+
.dccache
10+
.DS_Store
11+
daprdocs/.DS_Store
12+
daprdocs/content/.DS_Store
13+
daprdocs/content/en/.DS_Store
14+
daprdocs/resources/.DS_Store
15+
daprdocs/static/.DS_Store
16+
daprdocs/static/presentations/.DS_Store

daprdocs/content/en/concepts/observability-concept.md

+12-17
Original file line numberDiff line numberDiff line change
@@ -4,41 +4,36 @@ title: "Observability"
44
linkTitle: "Observability"
55
weight: 500
66
description: >
7-
Monitor applications through tracing, metrics, logs and health
7+
Observe applications through tracing, metrics, logs and health
88
---
99

10-
When building an application, understanding how the system is behaving is an important part of operating it - this includes having the ability to observe the internal calls of an application, gauging its performance and becoming aware of problems as soon as they occur. This is challenging for any system, but even more so for a distributed system comprised of multiple microservices where a flow, made of several calls, may start in one microservices but continue in another. Observability is critical in production environments, but also useful during development to understand bottlenecks, improve performance and perform basic debugging across the span of microservices.
10+
When building an application, understanding how the system is behaving is an important part of operating it - this includes having the ability to observe the internal calls of an application, gauging its performance and becoming aware of problems as soon as they occur. This is challenging for any system, but even more so for a distributed system comprised of multiple microservices where a flow, made of several calls, may start in one microservice but continue in another. Observability is critical in production environments, but also useful during development to understand bottlenecks, improve performance and perform basic debugging across the span of microservices.
1111

12-
While some data points about an application can be gathered from the underlying infrastructure (e.g. memory consumption, CPU usage), other meaningful information must be collected from an "application-aware" layer - one that can show how an important series of calls is executed across microservices. This usually means a developer must add some code to instrument an application for this purpose. Often, instrumentation code is simply meant to send collected data such as traces and metrics to an external monitoring tool or service that can help store, visualize and analyze all this information.
12+
While some data points about an application can be gathered from the underlying infrastructure (for example memory consumption, CPU usage), other meaningful information must be collected from an "application-aware" layer - one that can show how an important series of calls is executed across microservices. This usually means a developer must add some code to instrument an application for this purpose. Often, instrumentation code is simply meant to send collected data such as traces and metrics to observability tools or services that can help store, visualize and analyze all this information.
1313

14-
Having to maintain this code, which is not part of the core logic of the application, is another burden on the developer, sometimes requiring understanding the monitoring tools' APIs, using additional SDKs etc. This instrumentation may also add to the portability challenges of an application, which may require different instrumentation depending on where the application is deployed. For example, different cloud providers offer different monitoring solutions and an on-prem deployment might require an on-prem solution.
14+
Having to maintain this code, which is not part of the core logic of the application, is a burden on the developer, sometimes requiring understanding the observability tools' APIs, using additional SDKs etc. This instrumentation may also add to the portability challenges of an application, which may require different instrumentation depending on where the application is deployed. For example, different cloud providers offer different observability tools and an on-prem deployment might require an on-prem solution.
1515

1616
## Observability for your application with Dapr
17-
When building an application which leverages Dapr building blocks to perform service-to-service calls and pub/sub messaging, Dapr offers an advantage with respect to [distributed tracing]({{<ref tracing>}}). Because this inter-service communication flows through the Dapr sidecar, the sidecar is in a unique position to offload the burden of application-level instrumentation.
17+
When building an application which leverages Dapr API building blocks to perform service-to-service calls and pub/sub messaging, Dapr offers an advantage with respect to [distributed tracing]({{<ref tracing>}}). Because this inter-service communication flows through the Dapr sidecar, the sidecar is in a unique position to offload the burden of application-level instrumentation.
1818

1919
### Distributed tracing
20-
Dapr can be [configured to emit tracing data]({{<ref setup-tracing.md>}}), and because Dapr does so using widely adopted protocols such as the [Zipkin](https://zipkin.io) protocol, it can be easily integrated with multiple [monitoring backends]({{<ref supported-tracing-backends>}}).
20+
Dapr can be [configured to emit tracing data]({{<ref setup-tracing.md>}}), and because Dapr does so using the widely adopted protocols of [Open Telemetry (OTEL)](https://opentelemetry.io/) and [Zipkin](https://zipkin.io), it can be easily integrated with multiple observability tools.
2121

2222
<img src="/images/observability-tracing.png" width=1000 alt="Distributed tracing with Dapr">
2323

24-
### OpenTelemetry collector
25-
Dapr can also be configured to work with the [OpenTelemetry Collector]({{<ref open-telemetry-collector>}}) which offers even more compatibility with external monitoring tools.
24+
### Automatic tracing context generation
25+
Dapr uses [W3C tracing]({{<ref w3c-tracing-overview>}}) specification for tracing context, included as part Open Telemetry (OTEL), to generate and propagate the context header for the application or propagate user-provided context headers. This means that you get tracing by default with Dapr.
2626

27-
<img src="/images/observability-opentelemetry-collector.png" width=1000 alt="Distributed tracing via OpenTelemetry collector">
28-
29-
### Tracing context
30-
Dapr uses [W3C tracing]({{<ref w3c-tracing>}}) specification for tracing context and can generate and propagate the context header itself or propagate user-provided context headers.
31-
32-
## Observability for the Dapr sidecar and system services
33-
As for other parts of your system, you will want to be able to observe Dapr itself and collect metrics and logs emitted by the Dapr sidecar that runs along each microservice, as well as the Dapr-related services in your environment such as the control plane services that are deployed for a Dapr-enabled Kubernetes cluster.
27+
## Observability for the Dapr sidecar and control plane
28+
You also want to be able to observe Dapr itself, by collecting metrics on performance, throughput and latency and logs emitted by the Dapr sidecar, as well as the Dapr control plane services. Dapr sidecars have a health endpoint that can be probed to indicate their health status.
3429

3530
<img src="/images/observability-sidecar.png" width=1000 alt="Dapr sidecar metrics, logs and health checks">
3631

3732
### Logging
38-
Dapr generates [logs]({{<ref "logs.md">}}) to provide visibility into sidecar operation and to help users identify issues and perform debugging. Log events contain warning, error, info, and debug messages produced by Dapr system services. Dapr can also be configured to send logs to collectors such as [Fluentd]({{< ref fluentd.md >}}) and [Azure Monitor]({{< ref azure-monitor.md >}}) so they can be easily searched, analyzed and provide insights.
33+
Dapr generates [logs]({{<ref "logs.md">}}) to provide visibility into sidecar operation and to help users identify issues and perform debugging. Log events contain warning, error, info, and debug messages produced by Dapr system services. Dapr can also be configured to send logs to collectors such as [Fluentd]({{< ref fluentd.md >}}) and [Azure Monitor]({{< ref azure-monitor.md >}}) and others observability tools so they can be searched, analyzed and provide insights.
3934

4035
### Metrics
41-
Metrics are the series of measured values and counts that are collected and stored over time. [Dapr metrics]({{<ref "metrics">}}) provide monitoring capabilities to understand the behavior of the Dapr sidecar and system services. For example, the metrics between a Dapr sidecar and the user application show call latency, traffic failures, error rates of requests, etc. Dapr [system services metrics](https://github.com/dapr/dapr/blob/master/docs/development/dapr-metrics.md) show sidecar injection failures and the health of system services, including CPU usage, number of actor placements made, etc.
36+
Metrics are the series of measured values and counts that are collected and stored over time. [Dapr metrics]({{<ref "metrics">}}) provide monitoring capabilities to understand the behavior of the Dapr sidecar and control plane. For example, the metrics between a Dapr sidecar and the user application show call latency, traffic failures, error rates of requests, etc. Dapr [control plane metrics](https://github.com/dapr/dapr/blob/master/docs/development/dapr-metrics.md) show sidecar injection failures and the health of control plane services, including CPU usage, number of actor placements made, etc.
4237

4338
### Health checks
4439
The Dapr sidecar exposes an HTTP endpoint for [health checks]({{<ref sidecar-health.md>}}). With this API, user code or hosting environments can probe the Dapr sidecar to determine its status and identify issues with sidecar readiness.

daprdocs/content/en/developing-applications/building-blocks/observability/tracing-overview.md

+93-33
Original file line numberDiff line numberDiff line change
@@ -3,53 +3,113 @@ type: docs
33
title: "Distributed tracing"
44
linkTitle: "Distributed tracing"
55
weight: 1000
6-
description: "Use Dapr tracing to get visibility for distributed application"
6+
description: "Use tracing to get visibility into your application"
77
---
88

9-
Dapr uses the Zipkin protocol for distributed traces and metrics collection. Due to the ubiquity of the Zipkin protocol, many backends are supported out of the box, for examples [Stackdriver](https://cloud.google.com/stackdriver), [Zipkin](https://zipkin.io), [New Relic](https://newrelic.com) and others. Combining with the OpenTelemetry Collector, Dapr can export traces to many other backends including but not limted to [Azure Monitor](https://azure.microsoft.com/services/monitor/), [Datadog](https://www.datadoghq.com), Instana, [Jaeger](https://www.jaegertracing.io/), and [SignalFX](https://www.signalfx.com/).
9+
Dapr uses the Open Telemetry (OTEL) and Zipkin protocols for distributed traces. OTEL is the industry standard and is the recommended trace protocol to use.
1010

11-
<img src="/images/tracing.png" width=600>
11+
Most observability tools support OTEL. For example [Google Cloud Operations](https://cloud.google.com/products/operations), [New Relic](https://newrelic.com), [Azure Monitor](https://azure.microsoft.com/services/monitor/), [Datadog](https://www.datadoghq.com), Instana, [Jaeger](https://www.jaegertracing.io/), and [SignalFX](https://www.signalfx.com/).
1212

13-
## Tracing design
13+
## Scenarios
14+
Tracing is used with service invocaton and pub/sub APIs. You can flow trace context between services that uses these APIs.
1415

15-
Dapr adds a HTTP/gRPC middleware to the Dapr sidecar. The middleware intercepts all Dapr and application traffic and automatically injects correlation IDs to trace distributed transactions. This design has several benefits:
16+
There are two scenarios for how tracing is used:
17+
1. Dapr generates the trace context and you propagate the trace context to another service.
18+
2. You generate the trace context and Dapr propagates the trace context to a service.
1619

17-
* No need for code instrumentation. All traffic is automatically traced with configurable tracing levels.
18-
* Consistent tracing behavior across microservices. Tracing is configured and managed on Dapr sidecar so that it remains consistent across services made by different teams and potentially written in different programming languages.
19-
* Configurable and extensible. By leveraging the Zipkin API and the OpenTelemetry Collector, Dapr tracing can be configured to work with popular tracing backends, including custom backends a customer may have.
20-
* You can define and enable multiple exporters at the same time.
20+
### Propogating sequential service calls
21+
Dapr takes care of creating the trace headers. However, when there are more than two services, you're responsible for propagating the trace headers between them. Let's go through the scenarios with examples:
2122

22-
## W3C Correlation ID
23+
1. Single service invocation call (`service A -> service B`)
2324

24-
Dapr uses the standard W3C Trace Context headers. For HTTP requests, Dapr uses `traceparent` header. For gRPC requests, Dapr uses `grpc-trace-bin` header. When a request arrives without a trace ID, Dapr creates a new one. Otherwise, it passes the trace ID along the call chain.
25+
Dapr generates the trace headers in service A, which are then propagated from service A to service B. No further propagation is needed.
2526

26-
Read [W3C distributed tracing]({{< ref w3c-tracing >}}) for more background on W3C Trace Context.
27+
2. Multiple sequential service invocation calls ( `service A -> service B -> service C`)
2728

28-
## Configuration
29+
Dapr generates the trace headers at the beginning of the request in service A, which are then propagated to service B. You are now responsible for taking the headers and propagating them to service C, since this is specific to your application.
30+
31+
`service A -> service B -> propagate trace headers to -> service C` and so on to further Dapr-enabled services.
2932

30-
Dapr uses probabilistic sampling. The sample rate defines the probability a tracing span will be sampled and can have a value between 0 and 1 (inclusive). The default sample rate is 0.0001 (i.e. 1 in 10,000 spans is sampled).
33+
In other words, if the app is calling to Dapr and wants to trace with an existing span (trace header), it must always propagate to Dapr (from service B to service C in this case). Dapr always propagates trace spans to an application.
3134

32-
To change the default tracing behavior, use a configuration file (in self hosted mode) or a Kubernetes configuration object (in Kubernetes mode). For example, the following configuration object changes the sample rate to 1 (i.e. every span is sampled), and sends trace using Zipkin protocol to the Zipkin server at http://zipkin.default.svc.cluster.local
35+
{{% alert title="Note" color="primary" %}}
36+
There are no helper methods exposed in Dapr SDKs to propagate and retrieve trace context. You need to use HTTP/gRPC clients to propagate and retrieve trace headers through HTTP headers and gRPC metadata.
37+
{{% /alert %}}
3338

34-
```yaml
35-
apiVersion: dapr.io/v1alpha1
36-
kind: Configuration
37-
metadata:
38-
name: tracing
39-
namespace: default
40-
spec:
41-
tracing:
42-
samplingRate: "1"
43-
zipkin:
44-
endpointAddress: "http://zipkin.default.svc.cluster.local:9411/api/v2/spans"
45-
```
39+
3. Request is from external endpoint (for example, `from a gateway service to a Dapr-enabled service A`)
4640

47-
Note: Changing `samplingRate` to 0 disables tracing altogether.
41+
An external gateway ingress calls Dapr, which generates the trace headers and calls service A. Service A then calls service B and further Dapr-enabled services. You must propagate the headers from service A to service B: `Ingress -> service A -> propagate trace headers -> service B`. This is similar to case 2 above.
4842

49-
See the [References](#references) section for more details on how to configure tracing on local environment and Kubernetes environment.
43+
4. Pub/sub messages
44+
Dapr generates the trace headers in the published message topic. These trace headers are propagated to any services listening on that topic.
5045

51-
## References
46+
### Propogating multiple different service calls
47+
In the following scenarios, Dapr does some of the work for you and you need to either create or propagate trace headers.
5248

53-
- [How-To: Setup Application Insights for distributed tracing with OpenTelemetry Collector]({{< ref open-telemetry-collector.md >}})
54-
- [How-To: Set up Zipkin for distributed tracing]({{< ref zipkin.md >}})
55-
- [W3C distributed tracing]({{< ref w3c-tracing >}})
49+
1. Multiple service calls to different services from single service
50+
51+
When you are calling multiple services from a single service (see example below), you need to propagate the trace headers:
52+
53+
```
54+
service A -> service B
55+
[ .. some code logic ..]
56+
service A -> service C
57+
[ .. some code logic ..]
58+
service A -> service D
59+
[ .. some code logic ..]
60+
```
61+
62+
In this case, when service A first calls service B, Dapr generates the trace headers in service A, which are then propagated to service B. These trace headers are returned in the response from service B as part of response headers. You then need to propagate the returned trace context to the next services, service C and service D, as Dapr does not know you want to reuse the same header.
63+
64+
### Generating your own trace context headers from non-Daprized applications
65+
66+
You may have chosen to generate your own trace context headers.
67+
Generating your own trace context headers is more unusual and typically not required when calling Dapr. However, there are scenarios where you could specifically choose to add W3C trace headers into a service call; for example, you have an existing application that does not use Dapr. In this case, Dapr still propagates the trace context headers for you. If you decide to generate trace headers yourself, there are three ways this can be done:
68+
69+
1. You can use the industry standard [OpenTelemetry SDKs](https://opentelemetry.io/docs/instrumentation/) to generate trace headers and pass these trace headers to a Dapr-enabled service. This is the preferred method.
70+
71+
2. You can use a vendor SDK that provides a way to generate W3C trace headers and pass them to a Dapr-enabled service.
72+
73+
3. You can handcraft a trace context following [W3C trace context specifications](https://www.w3.org/TR/trace-context/) and pass them to a Dapr-enabled service.
74+
75+
## W3C trace context
76+
77+
Dapr uses the standard W3C trace context headers.
78+
79+
- For HTTP requests, Dapr uses `traceparent` header.
80+
- For gRPC requests, Dapr uses `grpc-trace-bin` header.
81+
82+
When a request arrives without a trace ID, Dapr creates a new one. Otherwise, it passes the trace ID along the call chain.
83+
84+
Read [trace context overview]({{< ref w3c-tracing-overview >}}) for more background on W3C trace context.
85+
86+
## W3C trace headers
87+
These are the specific trace context headers that are generated and propagated by Dapr for HTTP and gRPC.
88+
89+
### Trace context HTTP headers format
90+
When propagating a trace context header from an HTTP response to an HTTP request, you copy these headers.
91+
92+
#### Traceparent header
93+
The traceparent header represents the incoming request in a tracing system in a common format, understood by all vendors.
94+
Here’s an example of a traceparent header.
95+
96+
`traceparent: 00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01`
97+
98+
Find the traceparent fields detailed [here](https://www.w3.org/TR/trace-context/#traceparent-header).
99+
100+
#### Tracestate header
101+
The tracestate header includes the parent in a potentially vendor-specific format:
102+
103+
`tracestate: congo=t61rcWkgMzE`
104+
105+
Find the tracestate fields detailed [here](https://www.w3.org/TR/trace-context/#tracestate-header).
106+
107+
### Trace context gRPC headers format
108+
In the gRPC API calls, trace context is passed through `grpc-trace-bin` header.
109+
110+
## Related Links
111+
112+
- [Observability concepts]({{< ref observability-concept.md >}})
113+
- [W3C Trace Context for distributed tracing]({{< ref w3c-tracing-overview >}})
114+
- [W3C Trace Context specification](https://www.w3.org/TR/trace-context/)
115+
- [Observability quickstart](https://github.com/dapr/quickstarts/tree/master/tutorials/observability)

0 commit comments

Comments
 (0)