Skip to content

Releases: apollographql/router

v2.0.0-preview.4

16 Jan 21:44
v2.0.0-preview.4
be6736b
Compare
Choose a tag to compare
v2.0.0-preview.4 Pre-release
Pre-release
2.0.0-preview.4

v1.59.1

08 Jan 15:23
1230fa1
Compare
Choose a tag to compare

🐛 Fixes

Fix transmitted header value for Datadog priority sampling resolution (PR #6017)

The router now transmits correct values of x-datadog-sampling-priority to downstream services.

Previously, an x-datadog-sampling-priority of -1 was incorrectly converted to 0 for downstream requests, and 2 was incorrectly converted to 1. When propagating to downstream services, this resulted in values of USER_REJECT being incorrectly transmitted as AUTO_REJECT.

Enable accurate Datadog APM metrics (PR #6017)

The router supports a new preview feature, the preview_datadog_agent_sampling option, to enable sending all spans to the Datadog Agent so APM metrics and views are accurate.

Previously, the sampler option in telemetry.exporters.tracing.common.sampler wasn't Datadog-aware. To get accurate Datadog APM metrics, all spans must be sent to the Datadog Agent with a psr or sampling.priority attribute set appropriately to record the sampling decision.

The preview_datadog_agent_sampling option enables accurate Datadog APM metrics. It should be used when exporting to the Datadog Agent, via OTLP or Datadog-native.

telemetry:
  exporters:
    tracing:
      common:
        # Only 10 percent of spans will be forwarded from the Datadog agent to Datadog. Experiment to find a value that is good for you!
        sampler: 0.1
        # Send all spans to the Datadog agent.
        preview_datadog_agent_sampling: true

Using these options can decrease your Datadog bill, because you will be sending only a percentage of spans from the Datadog Agent to Datadog.

Important

  • Users must enable preview_datadog_agent_sampling to get accurate APM metrics. Users that have been using recent versions of the router will have to modify their configuration to retain full APM metrics.
  • The router doesn't support in-agent ingestion control.
  • Configuring traces_per_second in the Datadog Agent won't dynamically adjust the router's sampling rate to meet the target rate.
  • Sending all spans to the Datadog Agent may require that you tweak the batch_processor settings in your exporter config. This applies to both OTLP and Datadog native exporters.

Learn more by reading the updated Datadog tracing documentation for more information on configuration options and their implications.

Fix non-parent sampling (PR #6481)

When the user specifies a non-parent sampler the router should ignore the information from upstream and use its own sampling rate.

The following configuration would not work correctly:

  exporters:
    tracing:
      common:
        service_name: router
        sampler: 0.00001
        parent_based_sampler: false

All spans are being sampled.
This is now fixed and the router will correctly ignore any upstream sampling decision.

By @BrynCooke in #6481

v1.59.1-rc.0

19 Dec 10:25
Compare
Choose a tag to compare
v1.59.1-rc.0 Pre-release
Pre-release
1.59.1-rc.0

v1.59.0

17 Dec 11:19
3bdcacd
Compare
Choose a tag to compare

Important

If you have enabled distributed query plan caching, updates to the query planner in this release will result in query plan caches being regenerated rather than reused. On account of this, you should anticipate additional cache regeneration cost when updating to this router version while the new query plans come into service.

🚀 Features

General availability of native query planner

The router's native, Rust-based, query planner is now generally available and enabled by default.

The native query planner achieves better performance for a variety of graphs. In our tests, we observe:

  • 10x median improvement in query planning time (observed via apollo.router.query_planning.plan.duration)
  • 2.9x improvement in router’s CPU utilization
  • 2.2x improvement in router’s memory usage

Note: you can expect generated plans and subgraph operations in the native
query planner to have slight differences when compared to the legacy, JavaScript-based query planner. We've ascertained these differences to be semantically insignificant, based on comparing ~2.5 million known unique user operations in GraphOS as well as
comparing ~630 million operations across actual router deployments in shadow
mode for a four month duration.

The native query planner supports Federation v2 supergraphs. If you are using Federation v1 today, see our migration guide on how to update your composition build step. Subgraph changes are typically not needed.

The legacy, JavaScript, query planner is deprecated in this release, but you can still switch
back to it if you are still using Federation v1 supergraph:

experimental_query_planner_mode: legacy

Note: The subgraph operations generated by the query planner are not
guaranteed consistent release over release. We strongly recommend against
relying on the shape of planned subgraph operations, as new router features and
optimizations will continuously affect it.

By @sachindshinde, @goto-bus-stop, @duckki, @TylerBloom, @SimonSapin, @dariuszkuc, @lrlna, @clenfest, and @o0Ignition0o.

Ability to skip persisted query list safelisting enforcement via plugin (PR #6403)

If safelisting is enabled, a router_service plugin can skip enforcement of the safelist (including the require_id check) by adding the key apollo_persisted_queries::safelist::skip_enforcement with value true to the request context.

Note: this doesn't affect the logging of unknown operations by the persisted_queries.log_unknown option.

In cases where an operation would have been denied but is allowed due to the context key existing, the attribute persisted_queries.safelist.enforcement_skipped is set on the apollo.router.operations.persisted_queries metric with value true.

By @glasser in #6403

Add fleet awareness plugin (PR #6151)

A new fleet_awareness plugin has been added that reports telemetry to Apollo about the configuration and deployment of the router.

The reported telemetry include CPU and memory usage, CPU frequency, and other deployment characteristics such as operating system and cloud provider. For more details, along with a full list of data captured and how to opt out, go to our
data privacy policy.

By @jonathanrainer, @nmoutschen, @loshz in #6151

Add fleet awareness schema metric (PR #6283)

The router now supports the apollo.router.instance.schema metric for its fleet_detector plugin. It has two attributes: schema_hash and launch_id.

By @loshz and @nmoutschen in #6283

Support client name for persisted query lists (PR #6198)

The persisted query manifest fetched from Apollo Uplink can now contain a clientName field in each operation. Two operations with the same id but different clientName are considered to be distinct operations, and they may have distinct bodies.

The router resolves the client name by taking the first from the following that exists:

  • Reading the apollo_persisted_queries::client_name context key that may be set by a router_service plugin
  • Reading the HTTP header named by telemetry.apollo.client_name_header, which defaults to apollographql-client-name

If a client name can be resolved for a request, the router first tries to find a persisted query with the specified ID and the resolved client name.

If there is no operation with that ID and client name, or if a client name cannot be resolved, the router tries to find a persisted query with the specified ID and no client name specified. This means that existing PQ lists that don't contain client names will continue to work.

To learn more, go to persisted queries docs.

By @glasser in #6198

🐛 Fixes

Fix coprocessor empty body object panic (PR #6398)

Previously, the router would panic if a coprocessor responds with an empty body object at the supergraph stage:

{
  ... // other fields
  "body": {} // empty object
}

This has been fixed in this release.

Note: the previous issue didn't affect coprocessors that responded with formed responses.

By @BrynCooke in #6398

Ensure cost directives are picked up when not explicitly imported (PR #6328)

With the recent composition changes, importing @cost results in a supergraph schema with the cost specification import at the top. The @cost directive itself is not explicitly imported, as it's expected to be available as the default export from the cost link. In contrast, uses of @listSize to translate to an explicit import in the supergraph.

Old SDL link

@link(
    url: "https://specs.apollo.dev/cost/v0.1"
    import: ["@cost", "@listSize"]
)

New SDL link

@link(url: "https://specs.apollo.dev/cost/v0.1", import: ["@listSize"])

Instead of using the directive names from the import list in the link, the directive names now come from SpecDefinition::directive_name_in_schema, which is equivalent to the change we made on the composition side.

By @tninesling in #6328

Fix query hashing algorithm (PR #6205)

The router includes a schema-aware query hashing algorithm designed to return the same hash across schema updates if the query remains unaffected. This update enhances the algorithm by addressing various corner cases to improve its reliability and consistency.

By @Geal in #6205

Fix typo in persisted query metric attribute (PR #6332)

The apollo.router.operations.persisted_queries metric reports an attribute when a persisted query was not found.
Previously, the attribute name was persisted_quieries.not_found, with one i too many. Now it's persisted_queries.not_found.

By @goto-bus-stop in #6332

Fix telemetry instrumentation using supergraph query selector (PR #6324)

Previously, router telemetry instrumentation that used query selectors could log errors with messages such as this is a bug and should not happen.

These errors have now been fixed, and configurations with query selectors such as the following work properly:

telemetry:
  exporters:
    metrics:
      common:
        views:
          # Define a custom view because operation limits are different than the default latency-oriented view of OpenTelemetry
          - name: oplimits.*
            aggregation:
              histogram:
                buckets:
                  - 0
                  - 5
                  - 10
                  - 25
                  - 50
                  - 100
                  - 500
                  - 1000
  instrumentation:
    instruments:
      supergraph:
        oplimits.aliases:
          value:
            query: aliases
          type: histogram
          unit: number
          description: "Aliases for an operation"
        oplimits.depth:
          value:
            query: depth
          type: histogram
          unit: number
          description: "Depth for an operation"
        oplimits.height:
          value:
            query: height
          type: histogram
          unit: number
          description: "Height for an operation"
        oplimits.root_fields:
          value:
            query: root_fields
          type: histogram
          unit: number
          description: "Root fields for an operation"

By @bnjjj in #6324

More consistent attributes on apollo.router.operations.persisted_queries metric (PR #6403)

Version 1.28.1 added several unstable metrics, including `apollo.router.operations.persisted_que...

Read more

v1.59.0-rc.0

11 Dec 13:48
Compare
Choose a tag to compare
v1.59.0-rc.0 Pre-release
Pre-release
1.59.0-rc.0

v2.0.0-preview.3

09 Dec 22:05
ccc036b
Compare
Choose a tag to compare
v2.0.0-preview.3 Pre-release
Pre-release
2.0.0-preview.3

v1.58.1

06 Dec 08:05
12784d4
Compare
Choose a tag to compare

Important

If you have enabled Distributed query plan caching, this release contains changes which necessarily alter the hashing algorithm used for the cache keys. On account of this, you should anticipate additional cache regeneration cost when updating between these versions while the new hashing algorithm comes into service.

🐛 Fixes

Particular supergraph telemetry customizations using the query selector do not error (PR #6324)

Telemetry customizations like those featured in the request limits telemetry documentation now work as intended when using the query selector on the supergraph layer. Prior to this fix, this was sometimes causing a this is a bug and should not happen error, but is now resolved.

By @bnjjj in #6324

Native query planner now receives both "plan" and "path" limits configuration (PR #6316)

The native query planner now correctly sets two experimental configuration options for limiting query planning complexity. These were previously available in the configuration and observed by the legacy planner, but were not being passed to the new native planner until now:

  • supergraph.query_planning.experimental_plans_limit
  • supergraph.query_planning.experimental_paths_limit

By @goto-bus-stop in #6316

v1.58.1-rc.1

05 Dec 16:47
Compare
Choose a tag to compare
v1.58.1-rc.1 Pre-release
Pre-release
1.58.1-rc.1

v1.58.1-rc.0

04 Dec 14:33
Compare
Choose a tag to compare
v1.58.1-rc.0 Pre-release
Pre-release
1.58.1-rc.0

v1.58.0

27 Nov 17:37
d5b17f1
Compare
Choose a tag to compare

Important

If you have enabled Distributed query plan caching, this release contains changes which necessarily alter the hashing algorithm used for the cache keys. On account of this, you should anticipate additional cache regeneration cost when updating between these versions while the new hashing algorithm comes into service.

🚀 Features

Support DNS resolution strategy configuration (PR #6109)

The router now supports a configurable DNS resolution strategy for the URLs of coprocessors and subgraphs.
The new option is called dns_resolution_strategy and supports the following values:

  • ipv4_only - Only query for A (IPv4) records.
  • ipv6_only - Only query for AAAA (IPv6) records.
  • ipv4_and_ipv6 - Query for both A (IPv4) and AAAA (IPv6) records in parallel.
  • ipv6_then_ipv4 - Query for AAAA (IPv6) records first; if that fails, query for A (IPv4) records.
  • ipv4_then_ipv6(default) - Query for A (IPv4) records first; if that fails, query for AAAA (IPv6) records.

You can change the DNS resolution strategy applied to a subgraph's URL:

traffic_shaping:
  all:
    dns_resolution_strategy: ipv4_then_ipv6

You can also change the DNS resolution strategy applied to a coprocessor's URL:

coprocessor:
  url: http://coprocessor.example.com:8081
  client:
    dns_resolution_strategy: ipv4_then_ipv6

By @IvanGoncharov in #6109

Configuration options for HTTP/1 max headers and buffer limits (PR #6194)

This update introduces configuration options that allow you to adjust the maximum number of HTTP/1 request headers and the maximum buffer size allocated for headers.

By default, the router accepts HTTP/1 requests with up to 100 headers and allocates ~400 KiB of buffer space to store them. If you need to handle requests with more headers or require a different buffer size, you can now configure these limits in the router's configuration file:

limits:
  http1_request_max_headers: 200
  http1_request_max_buf_size: 200kib

If you are using the router as a Rust crate, the http1_request_max_buf_size option requires the hyper_header_limits feature and also necessitates using Apollo's fork of the Hyper crate until the changes are merged upstream.
You can include this fork by adding the following patch to your Cargo.toml file:

[patch.crates-io]
"hyper" = { git = "https://github.com/apollographql/hyper.git", tag = "header-customizations-20241108" }

By @IvanGoncharov in #6194

Compress subgraph operations by generating fragments (PR #6013)

The router now compresses operations sent to subgraphs by default by generating fragment
definitions and using them in the operation.

This change enables generate_query_fragments by default while disabling experimental_reuse_query_fragments. When enabled, experimental_reuse_query_fragments attempts to intelligently reuse the fragment definitions
from the original operation. However, fragment generation with generate_query_fragments is much faster and produces better outputs in most cases.

If you are relying on the shape of fragments in your subgraph operations or tests, you can opt out of the new algorithm with the configuration below.

Note: The subgraph operations generated by the query planner are not guaranteed consistent release over release. We strongly recommend against relying on the shape of planned subgraph operations, as new router features and optimizations will continuously affect it. We plan to remove experimental_reuse_query_fragments in a future release.

supergraph:
  generate_query_fragments: false
  experimental_reuse_query_fragments: true

By @lrlna in #6013

Add subgraph request id (PR #5858)

The router now supports a subgraph request ID that is a unique string identifying a subgraph request and response. It allows plugins and coprocessors to keep some state per subgraph request by matching on this ID. It's available in coprocessors as subgraphRequestId and Rhai scripts as request.subgraph.id and response.subgraph.id.

By @Geal in #5858

Add extensions.service for all subgraph errors (PR #6191)

For improved debuggability, the router now supports adding a subgraph's name as an extension to all errors originating from the subgraph.

If include_subgraph_errors is true for a particular subgraph, all errors originating in this subgraph will have the subgraph's name exposed as a service extension.

You can enable subgraph errors with the following configuration:

include_subgraph_errors:
  all: true # Propagate errors from all subgraphs

Note: This option is enabled by default by the router's dev mode.

Consequently, when a subgraph returns an error, it will have a service extension with the subgraph name as its value. The following example shows the extension for a products subgraph:

{
  "data": null,
  "errors": [
    {
      "message": "Invalid product ID",
      "path": [],
      "extensions": {
        "service": "products"
      }
    }
  ]
}

By @IvanGoncharov in #6191

Add @context support in the native query planner (PR #6310)

The @context feature is now available in the native query planner.
This brings the native query planner to feature parity with the legacy query planner for all Federation v2 graphs. The native query planner can be enabled with the following configuration:

experimental_query_planner_mode: new

By @clenfest, @TylerBloom in #6310

🐛 Fixes

Remove noisy demand control logs (PR #6192)

Demand control no longer logs warnings when a subgraph response is missing a requested field.

By @tninesling in #6192

Renamed headers' original values can again be propagated (PR #6281)

PR #4535 introduced a regression where the following header propagation config would not work:

headers:
- propagate:
    named: a
    rename: b
- propagate:
    named: a
    rename: c

The goal of the original PR was to prevent multiple headers from being mapped to a single target header. However, it did not consider renames and instead prevented multiple mappings from the same source header.
The router now propagates headers properly and ensures that a target header is only propagated to once.

By @BrynCooke in #6281

Introspection response deduplication should always produce results (Issue #6249)

To reduce CPU usage, query planning and introspection queries are deduplicated. In some cases, deduplicated introspection queries were not receiving their result. This issue has been fixed, and the router now sends results in all cases.

By @Geal in #6257

Don't log response data upon notification failure for subgraph batching (PR #6150)

For a subgraph batching operation, the router now doesn't log the entire subgraph response when failing to notify a waiting batch participant. This saves the router from logging the large amount of data (PII and/or non-PII data) that a subgraph response may contain.

By @garypen in #6150

Move heavy computation to a thread pool with a priority queue (PR #6247)

The router now avoids blocking threads when executing asynchronous code by using a thread pool with a priority queue.

This improves the performance of the following components that can take non-trivial amounts of CPU time:

  • GraphQL parsing
  • GraphQL validation
  • Query planning
  • Schema introspection

The size of the thread pool is based on the number of available CPU cores.

The thread pool replaces the router's prior implementation that used Tokio’s spawn_blocking.

apollo.router.compute_jobs.queued is a new gauge metric for the number of items in the thread pool's priority queue.

Note: when the native query planner is enabled, the dedicated queue of the legacy query planner is no longer used, so the apollo.router.query_planning.queued metric is no longer emitted.

By @SimonSapin in #6247

Limit the amount of GraphQL validation errors returned per response (PR #6187)

When an invalid query is submitted, the router now returns at most one hundred GraphQL parsing and validation errors in a response. This prevents generating too large of a response for a nonsensical document.

By @goto-bus-stop in https://github.com/apollograp...

Read more