diff --git a/CHANGELOG.md b/CHANGELOG.md index 67f90910ce1..0e2b262156c 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -169,6 +169,7 @@ ### Documentation +* [CHANGE] Add production tips related to cache size, heavy multi-tenancy and latency spikes. #9978 * [BUGFIX] Send native histograms: update the migration guide with the corrected dashboard query for switching between classic and native histograms queries. #10052 ### Tools diff --git a/docs/sources/mimir/manage/run-production-environment/production-tips/index.md b/docs/sources/mimir/manage/run-production-environment/production-tips/index.md index eae5143d4d0..2435d0493a5 100644 --- a/docs/sources/mimir/manage/run-production-environment/production-tips/index.md +++ b/docs/sources/mimir/manage/run-production-environment/production-tips/index.md @@ -147,6 +147,12 @@ The chunks caches store portions of time series samples fetched from object stor Entries in this cache tend to be large (several kilobytes) and are fetched in batches by the store-gateway components. This results in higher bandwidth usage compared to other caches. +### Cache size + +Memcached [extstore](https://docs.memcached.org/features/flashstorage/) feature allows to extend Memcached’s memory space onto flash (or similar) storage. + +Refer to [how we scaled Grafana Cloud Logs' Memcached cluster to 50TB and improved reliability](https://grafana.com/blog/2023/08/23/how-we-scaled-grafana-cloud-logs-memcached-cluster-to-50tb-and-improved-reliability/). + ## Security We recommend securing the Grafana Mimir cluster. @@ -176,3 +182,20 @@ To configure gRPC compression, use the following CLI flags or their YAML equival | `-ruler.query-frontend.grpc-client-config.grpc-compression` | `ingester_client.grpc_client_config.grpc_compression` | | `-alertmanager.alertmanager-client.grpc-compression` | `query_scheduler.grpc_client_config.grpc_compression` | | `-ingester.client.grpc-compression` | `ruler.query_frontend.grpc_client_config.grpc_compression` | + +## Heavy multi-tenancy + +For each tenant, Mimir opens and maintains a TSDB in memory. If you have a significant number of tenants, the memory overhead might become prohibitive. +To reduce the associated overhead, consider the following: + +- Reduce `-blocks-storage.tsdb.head-chunks-write-buffer-size-bytes`, default `4MB`. For example, try `1MB` or `128KB`. +- Reduce `-blocks-storage.tsdb.stripe-size`, default `16384`. For example, try `256`, or even `64`. +- Configure [shuffle sharding](https://grafana.com/docs/mimir/latest/configure/configure-shuffle-sharding/) + +## Periodic latency spikes when cutting blocks + +Depending on the workload, you might witness latency spikes when Mimir cuts blocks. +To reduce the impact of this behavior, consider the following: + +- Upgrade to `2.15+`. Refer to . +- Reduce `-blocks-storage.tsdb.block-ranges-period`, default `2h`. For example. try `1h`.