-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When -use-thanos-objstore=true .ruler_storage config partially overrides .ruler.storage ("Ruler storage is not configured" error); ruler must be configured in both places to start #16543
Comments
Make the storage configuration documentation more explicit that the Thanos object store client configuration is mututally exclusive with the legacy configuration. When the Thanos client is enabled, legacy configuration is all silently ignored, including legacy named stores. Document that the .ruler.storage configuration is only deprecated when the thanos storage client is enabled. It is still the required, and only, method of configuring the Ruler's storage when using the legacy object store. Document that .ruler.storage is mostly ignored when the thanos client is enabled, even when only using local storage, but it must still be present due to bug grafana#16543. .ruler_storage is where the actual configuration is read from in thanos mode. Warn that the ruler storage configuration does not support named stores.
This commit seems to try to do the right thing for the check about whether ruler is configured:
from #15345 It's not immediately clear how this would be failing. This code is in:
so should've been working. Issue with |
The Lines 1251 to 1257 in 4fa045d
It fails to check for I'm not sure where the startup-check bug where ruler isn't launched is coming from yet. |
Ah, worked it out.
if |
Fixes in #16555 |
Describe the bug
When -use-thanos-objstore=true is passed to Loki to enable the new thanos object store client, the ruler storage configuration is read from a
.ruler_storage
block instead of the.ruler.storage
block used when the thanos object store is not enabled. This is true even when Loki is configured to use only local storage.Additionally, the
.ruler.storage
block is still used even when thanos object store is enabled (and thus.ruler_storage
should be used instead):mkdir
the rules directory during startup; andall
orbackend
targets.So when enabling the Thanos client the
.ruler.storage
block must still be present and contain a valid directory path, otherwise Loki will log "Ruler storage is not configured" and fail to start the Ruler.So the
.ruler.storage
block is not wholly overridden by enabling the Thanos client.When the Thanos client is enabled,
.ruler_storage
and.ruler.storage
must BOTH be present, otherwise the theall
andbackend
targets will logRuler storage is not configured
and skip ruler startup. The actual configuration is taken only from.ruler_storage. This is very confusing. There is no warning logged that
.ruler.storage` is defined but will be ignored.It looks like
.ruler_storage
is supposed to work like.storage_config.object_store
; overriding the legacy config when-use-thanos-objstore=true
. But it doesn't do so completely because the startup mkdir for the rules directory and the startup check for whether the ruler is configured both use the legacy config in.ruler.storage
, even when using the thanos store.The docs for the respective blocks are not at all clear about their mutually exclusive use with and without thanos client mode either.
The error when
.ruler.storage
's path doesn't exist is:so it's coming from https://github.com/grafana/dskit when invoked by ruler-storage.
Illustrated in config excerpts:
So there are a two bugs here, and some docs confusion/UX issues:
all
andbackend
uses onlyruler.storage
when checking whether ruler is configured; ignores.ruler_storage
and the thanos client modeall
andbackend
uses onlyruler.storage
when trying to mkdir() the rules directory.ruler.storage
is (aside from the bugs above) ignored when thanos client is enabled, in favour of.ruler_storage
.ruler.storage
is deprecated and to use.ruler_storage
instead, but actually.ruler_storage
gets ignored unless the thanos client is enabled.To Reproduce
Use the attached shell script and loki configuration to explore the permutations of the configuration in a Docker container. Tested with Loki 3.3.2 and with Loki 3.4.2. See comments in
loki-config.yaml
and the script for details. It'll print the docker cli including Loki args when it runs.Files:
Run
to work around GH's bizarre file extension limits. Then:
./validate-config.sh -t all -m run
to see the non-thanos-store behaviour, where.ruler.storage.local.directory
is used for everything./validate-config.sh -t all -m -o run
to add-use-thanos-objstore=true
but not define a.ruler_storage
block. Loki will logRuler storage is not configured; ruler will not be started.
even though.ruler.storage.config.directory
is still present../validate-config.sh -t all -m -o -r run
to inject a.ruler_storage
block. The ruler will now start, but will complainmsg="unable to list rules" err="unable to read dir /ruler_storage_rules: open /ruler_storage_rules: no such file or directory"
because it's using the value from.ruler_storage.local.directory
not the one for.ruler.storage.local.directory
, and we didn't add a tmpfs mount for that..ruler.storage.local.directory
path by omitting the-m
flag:./validate-config.sh -t all -o -r run
. Loki will now error-exit withmkdir /ruler_rules: permission denied\nerror initialising module: ruler-storage
because it's trying to use the.ruler.storage.local.directory
path in its startup check, even though it'll use.ruler_storage.local.directory
everywhere else, and should be ignoring this configuration.Now delete the
.ruler.storage
block completely and re-run./validate-config.sh -t all -o -r run
. Even though it's using the thanos client and has a.ruler_storage
block, Loki will logRuler storage is not configured; ruler will not be started
and will fail to start the Ruler.But if you run
./validate-config.sh -t ruler -o -r run
to request just the ruler target be run with the thanos client andruler_storage
block, it'll start fine, because it bypasses the broken logic in the all and backend targets that use the wrong configuration.If you want to repro manually instead:
-target=all -use-thanos-objstore=true
with the original config file. It'll skip starting Ruler..ruler_storage
block. It'll start Ruler..ruler.storage.local.directory
point to something that Loki can't write to. Loki will try to create the dir and fail to start, even though it's otherwise unused..ruler.storage
block completely. Ruler will again skip starting, even though there's a.ruler_storage
block.--target=ruler
instead of--target=all
. It'll start fine, because the code that's using the wrong config block seems to be part of theall
andbackend
targets' startup logic, not Ruler itself.Expected behavior
I expected
.ruler.storage
to be used whether or not-use-thanos-objstore=true
is passed, especially when I'm using local ruler storage anyway.Alternately, when
-use-thanos-objstore=true
and.ruler.storage
is present, I would have expected to see a "warning: .ruler.storage overridden by .ruler_storage when -use-thanos-objstore=true" and the.ruler.storage
block completely ignored, rather than still being used for a mkdir on startup and to determine whether or not to start Ruler.(A related bug is that
.ruler_storage
lacks support fornamed_stores
, so it's also inconsistent with.storage_config.object_storage
; see #16544 )Environment:
grafana/loki:3.3.2
andgrafana/loki:3.4.2
. Note that the image hasEntrypoint
set to/usr/bin/loki
so it's not running any shell script before running Loki.Screenshots, Promtail config, or terminal output
Inline, above.
See also a related issue with ruler storage configuration not supporting named object stores: #16544
Note that the docs say
but they do not mention that this replaces the
.ruler.storage
block, which is ignored. Nor does.ruler.storage
mention it's ignored and replaced by.ruler_storage
when the thanos client is enabled.The text was updated successfully, but these errors were encountered: