You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: daprdocs/content/en/concepts/observability-concept.md
+12-17
Original file line number
Diff line number
Diff line change
@@ -4,41 +4,36 @@ title: "Observability"
4
4
linkTitle: "Observability"
5
5
weight: 500
6
6
description: >
7
-
Monitor applications through tracing, metrics, logs and health
7
+
Observe applications through tracing, metrics, logs and health
8
8
---
9
9
10
-
When building an application, understanding how the system is behaving is an important part of operating it - this includes having the ability to observe the internal calls of an application, gauging its performance and becoming aware of problems as soon as they occur. This is challenging for any system, but even more so for a distributed system comprised of multiple microservices where a flow, made of several calls, may start in one microservices but continue in another. Observability is critical in production environments, but also useful during development to understand bottlenecks, improve performance and perform basic debugging across the span of microservices.
10
+
When building an application, understanding how the system is behaving is an important part of operating it - this includes having the ability to observe the internal calls of an application, gauging its performance and becoming aware of problems as soon as they occur. This is challenging for any system, but even more so for a distributed system comprised of multiple microservices where a flow, made of several calls, may start in one microservice but continue in another. Observability is critical in production environments, but also useful during development to understand bottlenecks, improve performance and perform basic debugging across the span of microservices.
11
11
12
-
While some data points about an application can be gathered from the underlying infrastructure (e.g. memory consumption, CPU usage), other meaningful information must be collected from an "application-aware" layer - one that can show how an important series of calls is executed across microservices. This usually means a developer must add some code to instrument an application for this purpose. Often, instrumentation code is simply meant to send collected data such as traces and metrics to an external monitoring tool or service that can help store, visualize and analyze all this information.
12
+
While some data points about an application can be gathered from the underlying infrastructure (for example memory consumption, CPU usage), other meaningful information must be collected from an "application-aware" layer - one that can show how an important series of calls is executed across microservices. This usually means a developer must add some code to instrument an application for this purpose. Often, instrumentation code is simply meant to send collected data such as traces and metrics to observability tools or services that can help store, visualize and analyze all this information.
13
13
14
-
Having to maintain this code, which is not part of the core logic of the application, is another burden on the developer, sometimes requiring understanding the monitoring tools' APIs, using additional SDKs etc. This instrumentation may also add to the portability challenges of an application, which may require different instrumentation depending on where the application is deployed. For example, different cloud providers offer different monitoring solutions and an on-prem deployment might require an on-prem solution.
14
+
Having to maintain this code, which is not part of the core logic of the application, is a burden on the developer, sometimes requiring understanding the observability tools' APIs, using additional SDKs etc. This instrumentation may also add to the portability challenges of an application, which may require different instrumentation depending on where the application is deployed. For example, different cloud providers offer different observability tools and an on-prem deployment might require an on-prem solution.
15
15
16
16
## Observability for your application with Dapr
17
-
When building an application which leverages Dapr building blocks to perform service-to-service calls and pub/sub messaging, Dapr offers an advantage with respect to [distributed tracing]({{<reftracing>}}). Because this inter-service communication flows through the Dapr sidecar, the sidecar is in a unique position to offload the burden of application-level instrumentation.
17
+
When building an application which leverages Dapr API building blocks to perform service-to-service calls and pub/sub messaging, Dapr offers an advantage with respect to [distributed tracing]({{<reftracing>}}). Because this inter-service communication flows through the Dapr sidecar, the sidecar is in a unique position to offload the burden of application-level instrumentation.
18
18
19
19
### Distributed tracing
20
-
Dapr can be [configured to emit tracing data]({{<refsetup-tracing.md>}}), and because Dapr does so using widely adopted protocols such as the [Zipkin](https://zipkin.io) protocol, it can be easily integrated with multiple [monitoring backends]({{<refsupported-tracing-backends>}}).
20
+
Dapr can be [configured to emit tracing data]({{<refsetup-tracing.md>}}), and because Dapr does so using the widely adopted protocols of [Open Telemetry (OTEL)](https://opentelemetry.io/) and [Zipkin](https://zipkin.io), it can be easily integrated with multiple observability tools.
21
21
22
22
<imgsrc="/images/observability-tracing.png"width=1000alt="Distributed tracing with Dapr">
23
23
24
-
### OpenTelemetry collector
25
-
Dapr can also be configured to work with the [OpenTelemetry Collector]({{<refopen-telemetry-collector>}}) which offers even more compatibility with external monitoring tools.
24
+
### Automatic tracing context generation
25
+
Dapr uses [W3C tracing]({{<refw3c-tracing-overview>}}) specification for tracing context, included as part Open Telemetry (OTEL), to generate and propagate the context header for the application or propagate user-provided context headers. This means that you get tracing by default with Dapr.
26
26
27
-
<imgsrc="/images/observability-opentelemetry-collector.png"width=1000alt="Distributed tracing via OpenTelemetry collector">
28
-
29
-
### Tracing context
30
-
Dapr uses [W3C tracing]({{<refw3c-tracing>}}) specification for tracing context and can generate and propagate the context header itself or propagate user-provided context headers.
31
-
32
-
## Observability for the Dapr sidecar and system services
33
-
As for other parts of your system, you will want to be able to observe Dapr itself and collect metrics and logs emitted by the Dapr sidecar that runs along each microservice, as well as the Dapr-related services in your environment such as the control plane services that are deployed for a Dapr-enabled Kubernetes cluster.
27
+
## Observability for the Dapr sidecar and control plane
28
+
You also want to be able to observe Dapr itself, by collecting metrics on performance, throughput and latency and logs emitted by the Dapr sidecar, as well as the Dapr control plane services. Dapr sidecars have a health endpoint that can be probed to indicate their health status.
34
29
35
30
<imgsrc="/images/observability-sidecar.png"width=1000alt="Dapr sidecar metrics, logs and health checks">
36
31
37
32
### Logging
38
-
Dapr generates [logs]({{<ref "logs.md">}}) to provide visibility into sidecar operation and to help users identify issues and perform debugging. Log events contain warning, error, info, and debug messages produced by Dapr system services. Dapr can also be configured to send logs to collectors such as [Fluentd]({{< ref fluentd.md >}}) and [Azure Monitor]({{< ref azure-monitor.md >}}) so they can be easily searched, analyzed and provide insights.
33
+
Dapr generates [logs]({{<ref "logs.md">}}) to provide visibility into sidecar operation and to help users identify issues and perform debugging. Log events contain warning, error, info, and debug messages produced by Dapr system services. Dapr can also be configured to send logs to collectors such as [Fluentd]({{< ref fluentd.md >}}) and [Azure Monitor]({{< ref azure-monitor.md >}}) and others observability tools so they can be searched, analyzed and provide insights.
39
34
40
35
### Metrics
41
-
Metrics are the series of measured values and counts that are collected and stored over time. [Dapr metrics]({{<ref "metrics">}}) provide monitoring capabilities to understand the behavior of the Dapr sidecar and system services. For example, the metrics between a Dapr sidecar and the user application show call latency, traffic failures, error rates of requests, etc. Dapr [system services metrics](https://github.com/dapr/dapr/blob/master/docs/development/dapr-metrics.md) show sidecar injection failures and the health of system services, including CPU usage, number of actor placements made, etc.
36
+
Metrics are the series of measured values and counts that are collected and stored over time. [Dapr metrics]({{<ref "metrics">}}) provide monitoring capabilities to understand the behavior of the Dapr sidecar and control plane. For example, the metrics between a Dapr sidecar and the user application show call latency, traffic failures, error rates of requests, etc. Dapr [control plane metrics](https://github.com/dapr/dapr/blob/master/docs/development/dapr-metrics.md) show sidecar injection failures and the health of control plane services, including CPU usage, number of actor placements made, etc.
42
37
43
38
### Health checks
44
39
The Dapr sidecar exposes an HTTP endpoint for [health checks]({{<refsidecar-health.md>}}). With this API, user code or hosting environments can probe the Dapr sidecar to determine its status and identify issues with sidecar readiness.
Copy file name to clipboardexpand all lines: daprdocs/content/en/developing-applications/building-blocks/observability/tracing-overview.md
+93-33
Original file line number
Diff line number
Diff line change
@@ -3,53 +3,113 @@ type: docs
3
3
title: "Distributed tracing"
4
4
linkTitle: "Distributed tracing"
5
5
weight: 1000
6
-
description: "Use Dapr tracing to get visibility for distributed application"
6
+
description: "Use tracing to get visibility into your application"
7
7
---
8
8
9
-
Dapr uses the Zipkin protocol for distributed traces and metrics collection. Due to the ubiquity of the Zipkin protocol, many backends are supported out of the box, for examples [Stackdriver](https://cloud.google.com/stackdriver), [Zipkin](https://zipkin.io), [New Relic](https://newrelic.com)and others. Combining with the OpenTelemetry Collector, Dapr can export traces to many other backends including but not limted to [Azure Monitor](https://azure.microsoft.com/services/monitor/), [Datadog](https://www.datadoghq.com), Instana, [Jaeger](https://www.jaegertracing.io/), and [SignalFX](https://www.signalfx.com/).
9
+
Dapr uses the Open Telemetry (OTEL) and Zipkin protocols for distributed traces. OTEL is the industry standard and is the recommended trace protocol to use.
10
10
11
-
<imgsrc="/images/tracing.png"width=600>
11
+
Most observability tools support OTEL. For example [Google Cloud Operations](https://cloud.google.com/products/operations), [New Relic](https://newrelic.com), [Azure Monitor](https://azure.microsoft.com/services/monitor/), [Datadog](https://www.datadoghq.com), Instana, [Jaeger](https://www.jaegertracing.io/), and [SignalFX](https://www.signalfx.com/).
12
12
13
-
## Tracing design
13
+
## Scenarios
14
+
Tracing is used with service invocaton and pub/sub APIs. You can flow trace context between services that uses these APIs.
14
15
15
-
Dapr adds a HTTP/gRPC middleware to the Dapr sidecar. The middleware intercepts all Dapr and application traffic and automatically injects correlation IDs to trace distributed transactions. This design has several benefits:
16
+
There are two scenarios for how tracing is used:
17
+
1. Dapr generates the trace context and you propagate the trace context to another service.
18
+
2. You generate the trace context and Dapr propagates the trace context to a service.
16
19
17
-
* No need for code instrumentation. All traffic is automatically traced with configurable tracing levels.
18
-
* Consistent tracing behavior across microservices. Tracing is configured and managed on Dapr sidecar so that it remains consistent across services made by different teams and potentially written in different programming languages.
19
-
* Configurable and extensible. By leveraging the Zipkin API and the OpenTelemetry Collector, Dapr tracing can be configured to work with popular tracing backends, including custom backends a customer may have.
20
-
* You can define and enable multiple exporters at the same time.
20
+
### Propogating sequential service calls
21
+
Dapr takes care of creating the trace headers. However, when there are more than two services, you're responsible for propagating the trace headers between them. Let's go through the scenarios with examples:
21
22
22
-
## W3C Correlation ID
23
+
1. Single service invocation call (`service A -> service B`)
23
24
24
-
Dapr uses the standard W3C Trace Context headers. For HTTP requests, Dapr uses `traceparent` header. For gRPC requests, Dapr uses `grpc-trace-bin` header. When a request arrives without a trace ID, Dapr creates a new one. Otherwise, it passes the trace ID along the call chain.
25
+
Dapr generates the trace headers in service A, which are then propagated from service A to service B. No further propagation is needed.
25
26
26
-
Read [W3C distributed tracing]({{< ref w3c-tracing >}}) for more background on W3C Trace Context.
27
+
2. Multiple sequential service invocation calls ( `service A -> service B -> service C`)
27
28
28
-
## Configuration
29
+
Dapr generates the trace headers at the beginning of the request in service A, which are then propagated to service B. You are now responsible for taking the headers and propagating them to service C, since this is specific to your application.
30
+
31
+
`service A -> service B -> propagate trace headers to -> service C` and so on to further Dapr-enabled services.
29
32
30
-
Dapr uses probabilistic sampling. The sample rate defines the probability a tracing span will be sampled and can have a value between 0 and 1 (inclusive). The default sample rate is 0.0001 (i.e. 1 in 10,000 spans is sampled).
33
+
In other words, if the app is calling to Dapr and wants to trace with an existing span (trace header), it must always propagate to Dapr (from service B to service C in this case). Dapr always propagates trace spans to an application.
31
34
32
-
To change the default tracing behavior, use a configuration file (in self hosted mode) or a Kubernetes configuration object (in Kubernetes mode). For example, the following configuration object changes the sample rate to 1 (i.e. every span is sampled), and sends trace using Zipkin protocol to the Zipkin server at http://zipkin.default.svc.cluster.local
35
+
{{% alert title="Note" color="primary" %}}
36
+
There are no helper methods exposed in Dapr SDKs to propagate and retrieve trace context. You need to use HTTP/gRPC clients to propagate and retrieve trace headers through HTTP headers and gRPC metadata.
3. Request is from external endpoint (for example, `from a gateway service to a Dapr-enabled service A`)
46
40
47
-
Note: Changing `samplingRate` to 0 disables tracing altogether.
41
+
An external gateway ingress calls Dapr, which generates the trace headers and calls service A. Service A then calls service B and further Dapr-enabled services. You must propagate the headers from service A to service B: `Ingress -> service A -> propagate trace headers -> service B`. This is similar to case 2 above.
48
42
49
-
See the [References](#references) section for more details on how to configure tracing on local environment and Kubernetes environment.
43
+
4. Pub/sub messages
44
+
Dapr generates the trace headers in the published message topic. These trace headers are propagated to any services listening on that topic.
50
45
51
-
## References
46
+
### Propogating multiple different service calls
47
+
In the following scenarios, Dapr does some of the work for you and you need to either create or propagate trace headers.
52
48
53
-
- [How-To: Setup Application Insights for distributed tracing with OpenTelemetry Collector]({{< ref open-telemetry-collector.md >}})
54
-
- [How-To: Set up Zipkin for distributed tracing]({{< ref zipkin.md >}})
1. Multiple service calls to different services from single service
50
+
51
+
When you are calling multiple services from a single service (see example below), you need to propagate the trace headers:
52
+
53
+
```
54
+
service A -> service B
55
+
[ .. some code logic ..]
56
+
service A -> service C
57
+
[ .. some code logic ..]
58
+
service A -> service D
59
+
[ .. some code logic ..]
60
+
```
61
+
62
+
In this case, when service A first calls service B, Dapr generates the trace headers in service A, which are then propagated to service B. These trace headers are returned in the response from service B as part of response headers. You then need to propagate the returned trace context to the next services, service C and service D, as Dapr does not know you want to reuse the same header.
63
+
64
+
### Generating your own trace context headers from non-Daprized applications
65
+
66
+
You may have chosen to generate your own trace context headers.
67
+
Generating your own trace context headers is more unusual and typically not required when calling Dapr. However, there are scenarios where you could specifically choose to add W3C trace headers into a service call; for example, you have an existing application that does not use Dapr. In this case, Dapr still propagates the trace context headers for you. If you decide to generate trace headers yourself, there are three ways this can be done:
68
+
69
+
1. You can use the industry standard [OpenTelemetry SDKs](https://opentelemetry.io/docs/instrumentation/) to generate trace headers and pass these trace headers to a Dapr-enabled service. This is the preferred method.
70
+
71
+
2. You can use a vendor SDK that provides a way to generate W3C trace headers and pass them to a Dapr-enabled service.
72
+
73
+
3. You can handcraft a trace context following [W3C trace context specifications](https://www.w3.org/TR/trace-context/) and pass them to a Dapr-enabled service.
74
+
75
+
## W3C trace context
76
+
77
+
Dapr uses the standard W3C trace context headers.
78
+
79
+
- For HTTP requests, Dapr uses `traceparent` header.
80
+
- For gRPC requests, Dapr uses `grpc-trace-bin` header.
81
+
82
+
When a request arrives without a trace ID, Dapr creates a new one. Otherwise, it passes the trace ID along the call chain.
83
+
84
+
Read [trace context overview]({{< ref w3c-tracing-overview >}}) for more background on W3C trace context.
85
+
86
+
## W3C trace headers
87
+
These are the specific trace context headers that are generated and propagated by Dapr for HTTP and gRPC.
88
+
89
+
### Trace context HTTP headers format
90
+
When propagating a trace context header from an HTTP response to an HTTP request, you copy these headers.
91
+
92
+
#### Traceparent header
93
+
The traceparent header represents the incoming request in a tracing system in a common format, understood by all vendors.
0 commit comments