Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(vault): secret is temporarily empty after changed vault config #14209

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

cshuaimin
Copy link

Summary

When vault config is changed, in the worker event we first flush the LRU cache, then start to update the secrets from vault provider. There’s a period of time in between that the cache is empty.
The kong.vault.update() function only lookups cache and will update the secret to an empty string when cache is empty. This can cause plugins to throw nil errors.
This commit changed kong.vault.update() function to not touch it if not found in the cache.

Checklist

  • The Pull Request has tests
  • A changelog file has been created under changelog/unreleased/kong or skip-changelog label added on PR if changelog is unnecessary. README.md
  • There is a user-facing docs PR against https://github.com/Kong/docs.konghq.com - PUT DOCS PR HERE

Issue reference

FTI-5936

When vault config is changed, in the worker event we first flush the LRU
cache, then start to update the secrets from vault provider. There’s a
period of time in between that the cache is empty.
The `kong.vault.update()` function only lookups cache and will update
the secret to an empty string when cache is empty. This can cause plugins
to throw nil errors. This commit changed `kong.vault.update()` function
to not touch it if not found in the cache.
@github-actions github-actions bot added core/pdk cherry-pick kong-ee schedule this PR for cherry-picking to kong/kong-ee labels Jan 21, 2025
When vault config is changed, in the worker event we first flush the LRU
cache, then start to update the secrets from vault provider. There’s a
period of time in between that the cache is empty.
The `kong.vault.update()` function only lookups cache and will update
the secret to an empty string when cache is empty. This can cause plugins
to throw nil errors. This commit changed the worker event callback to
not flush the LRU cache. The cache will be updated when the secrets are
fetched from vault prodiver.
@cshuaimin
Copy link
Author

Pushed a new commit to fix the issue in another way: instead of not touching input table in the pdk update function, in this new commit I completely removed the LRU:flush_all() line in the worker event. The LRU capacity is fixed and the keys has ttls, so there likely won’t be memory leaks.

@bungle
Copy link
Member

bungle commented Feb 5, 2025

The LRU capacity is fixed and the keys has ttls, so there likely won’t be memory leaks.

What about the configuration stickyness issue (just double checking that stickyness is not back with this change)?

@@ -1429,8 +1429,6 @@ local function new(self)
end
end

LRU:flush_all()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LuaDoc has:

  ---
  -- Flushes LRU caches and forcibly rotates the secrets.
  --
  -- This is only ever executed on traditional nodes.

I guess this may need to be changed. Also should it be executed with incremental too?

@cshuaimin
Copy link
Author

Hi @bungle sorry for the late reply, I just took another look at this. I think current approach may not be ideal for these reasons:

  1. It's not enough to completely fix the 'could not find cached value' issue, because there's another lru flush in kong.vault.flush() function. In my test, I changed the vault config in Konnect UI and can still see several such errors in a short time even after this change. And I think we can not remove the flush in this place right? Because the function is called flush...
  2. the configuration stickyness issue may come back (not very sure)

Do you think the first commit is okay? The only downside is the behavior of kong.vault.flush() changed slightly. But the function is marked as experimental in the docs, and seems that the usage of the function in kong will not be affected. If it's acceptable, I will revert the second commit and change the failed test "update function sets values to empty string on failure".

@bungle
Copy link
Member

bungle commented Feb 14, 2025

@cshuaimin The first commit is ok with me! Thanks!

The test "resurrects plugin config references when secret is deleted" failed.
It is asserting that when a secret resurrect ttl is passed, it should be
deleted. However with previous update() function logic the value is
only set if it's not nil. This commit changes the logic to set the value
based on if there's an error or not.
@cshuaimin cshuaimin force-pushed the fix/vault-cache-flush-issue branch from 00e069a to 9d82074 Compare February 17, 2025 03:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-pick kong-ee schedule this PR for cherry-picking to kong/kong-ee core/pdk size/M
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants