Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(pdk): refine with optional cache to improvement performance #13981

Open
wants to merge 20 commits into
base: master
Choose a base branch
from

Conversation

ProBrian
Copy link
Contributor

@ProBrian ProBrian commented Dec 5, 2024

Summary

Checklist

  • The Pull Request has tests
  • A changelog file has been created under changelog/unreleased/kong or skip-changelog label added on PR if changelog is unnecessary. README.md
  • There is a user-facing docs PR against https://github.com/Kong/docs.konghq.com - PUT DOCS PR HERE

Issue reference

Fix #[issue number]

@@ -619,14 +617,14 @@ local function new(self)
-- kong.request.get_header("Host") -- "foo.com"
-- kong.request.get_header("x-custom-header") -- "bla"
-- kong.request.get_header("X-Another") -- "foo bar"
function _REQUEST.get_header(name)
function _REQUEST.get_header(name, ctx)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

http_get_header supports ctx param to cache headers, expose it to get_header API so we could use cache.

Copy link
Member

@bungle bungle Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this either. If we add this parameter, we need to always keep the API like that, that the second parameter needs to be ctx. The benefits are so small that I would rather not do it. The most of the changes in plugins seems to be utilizing this. And that is also I think somewhat "bad" code to make people create tables of ctx and pass it here. If we cannot find better way to do this, then I would not do it.

Just an example e.g. if someone wants to add more beneficial parameter lilke default in case the header is missing the code would become:

local value = kong.request.get_header("Some-Header", nil, "default")

Perhaps we will never add that either, but just tells that we don't want these strange ctx parameters in public API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we use kong.tools.get_header(name, ctx) directly in plugins, to replace non-cached kong.request.get_header API in certain places to achieve the cache benefit? It must be better to use PDK APIs in plugins, but I do see that there are a lot of kong.tools usage in plugins(which I feel more like should be used "internally" in core or some non-plugin module?), Do you think it's a proper practice?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ProBrian In general it would be best if we can get to state where there is no require "module" statements in plugins. That is we have a powerful PDK. But meanwhile, yes, many plugins require different modules. That also then spreads to all custom plugins as plugins are copy pasted. That then will make all our internal APIs and tools public too which we cannot break. I think for example those plugins, just call get_headers once and work with headers and call it done, rather than pass a table to somewhere that something else caches the get_headers.

kong/pdk/request.lua Outdated Show resolved Hide resolved
@@ -285,10 +285,11 @@ local function new(pdk, major_version)
-- -- or `access` phase prior calling this function.
--
-- local body = kong.service.response.get_raw_body()
function response.get_raw_body()
function response.get_raw_body(buffered_proxying)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a minor refinement so that if we could reduce the reads of ngx.ctx.buffered_proxying by passing it by param

@@ -158,21 +158,22 @@ end

local function do_authentication(conf)
local www_authenticate = "Basic realm=\"" .. conf.realm .. "\""
local ctx = {}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a local ctx here which is cheaper thanngx.ctx, and the cache is only fresh in the local ctx lifetime, not in the request lifetime, so we don't need ngx.ctx

@ProBrian ProBrian requested a review from bungle December 16, 2024 03:11
Copy link
Member

@bungle bungle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's revert all the API changes and related changes in where the changed APIs were used. Benefit vs. drawbacks are too small. Those optimizations can be implemented inside plugins themselves if they are that good. In many cases it also depends whether plugin is reading single or multiple headers to what to use, eg. is. get_headers better than multiple get_header and is multiple get_header better with created context table at what count?

@@ -619,14 +617,14 @@ local function new(self)
-- kong.request.get_header("Host") -- "foo.com"
-- kong.request.get_header("x-custom-header") -- "bla"
-- kong.request.get_header("X-Another") -- "foo bar"
function _REQUEST.get_header(name)
function _REQUEST.get_header(name, ctx)
Copy link
Member

@bungle bungle Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this either. If we add this parameter, we need to always keep the API like that, that the second parameter needs to be ctx. The benefits are so small that I would rather not do it. The most of the changes in plugins seems to be utilizing this. And that is also I think somewhat "bad" code to make people create tables of ctx and pass it here. If we cannot find better way to do this, then I would not do it.

Just an example e.g. if someone wants to add more beneficial parameter lilke default in case the header is missing the code would become:

local value = kong.request.get_header("Some-Header", nil, "default")

Perhaps we will never add that either, but just tells that we don't want these strange ctx parameters in public API.

kong/pdk/service/response.lua Outdated Show resolved Hide resolved
@ProBrian
Copy link
Contributor Author

Let's revert all the API changes and related changes in where the changed APIs were used. Benefit vs. drawbacks are too small. Those optimizations can be implemented inside plugins themselves if they are that good. In many cases it also depends whether plugin is reading single or multiple headers to what to use, eg. is. get_headers better than multiple get_header and is multiple get_header better with created context table at what count?

Removed all public API changes. @bungle

@bungle
Copy link
Member

bungle commented Jan 31, 2025

@ProBrian I'll setup a meeting, where we can discuss how to get this in.

@chronolaw chronolaw changed the title Pdk refine with optional cache to improvement performance perf(pdk): refine with optional cache to improvement performance Feb 7, 2025
kong/globalpatches.lua Outdated Show resolved Hide resolved
@chronolaw
Copy link
Contributor

Do we have any data about improvements?

@@ -116,7 +116,7 @@ X-Missing: nil
=== TEST 4: response.get_header() returns nil when response header does not fit in default max_headers
=== TEST 4: response.get_header() returns existing header will not limited by get_headers() default max_headers configuration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a new feature?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a new feature, just a refine, since we use ngx.header[name] to replace _RESPONSE.get_headers()[name] in the get_header implementation, the max headers limitation brought by get_headers() will not exist anymore.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this extends the capability of this PDK function. Shall we consider this a new feature?

Copy link
Contributor Author

@ProBrian ProBrian Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thinks it's not needed to consider this a new feature, because

  1. the limitation is due to our internal implementation.
  2. the limitation is not documented which means users aren't even aware of this limitation.

that means we have no interface changes and no documentation changes from user's perspective.

@ProBrian
Copy link
Contributor Author

ProBrian commented Feb 7, 2025

Do we have any data about improvements?

I did a preliminary test on aws, you can refer to https://konghq.atlassian.net/wiki/spaces/~712020faf780811dc24bb28ad1df7390c03c41/pages/4217667620/A+Journey+to+PDK+Caching#One-more-try-for-the-2nd-half, but that's not a standard test environment. I think we should run a perf test after this PR merged to see.

kong/globalpatches.lua Outdated Show resolved Hide resolved
@@ -51,7 +50,7 @@ function grpc_web:access(conf)
end

local dec, err = deco.new(
kong_request_get_header("Content-Type"),
ngx_var.http_content_type,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may not correct, since now nginx could return multiple headers in one value, like "a, b, c".
could you check other places? thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bungle , could you verify it? thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@bungle bungle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this and one assumption was not correct.

@@ -820,8 +818,7 @@ local function new(self)
-- body.age -- "42"
function _REQUEST.get_body(mimetype, max_args, max_allowed_file_size)
check_phase(before_content)

local content_type = mimetype or _REQUEST.get_header(CONTENT_TYPE)
local content_type = mimetype or ngx.var.http_content_type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ProBrian

ngx.var.content_type does not seem to be single header. If you send multiple, you will get "<value1>, <value2>, ...".

Copy link
Member

@bungle bungle Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is even more annoying is that:

ngx.req.set_header("Content-Type", "value")

is only reflected with the first value. Seems like clear_header is needed together with set_header?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to work properly:

local ct = kong.request.get_header("Content-Type")
ngx.req.clear_header("Content-Type")
ngx.req.set_header("Content-Type", ct)

If I don't do clear_header, it does not work properly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the request I am using:

http -v :8001 Content-Type:text/html Content-Type:text/json

Copy link
Contributor Author

@ProBrian ProBrian Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems there's a mismatch limitation between openresty and nginx that I've ignored, that is,

  • nginx allows multiple line values for headers like "Content-Type".

  • And openresty defines a list of headers that is "built-in single value", which includes "Content-Type". ngx.req.set_header will not allow multiple value for "Content-Type", so if we ngx.req.set_header("Content-Type", {"json1", "json2"}) , it will use the last elem as the single value, so then ngx.req.get_headers()["Content-Type"] and ngx.var.http_content_type will all get a single value "json2" .

  • But when multiple content type header is sent by http request client, not by openresty set_header API, then the header value is just handled by nginx, and get_headers will get a table {"json1", "json2"} and ngx.var.http_content_type will get the value as comma separated string: "json1, json2".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is even more annoying is that:

ngx.req.set_header("Content-Type", "value")

is only reflected with the first value. Seems like clear_header is needed together with set_header?

That's due to the implementation of openresty, it assumes that Content-Type should be single value, so in set_header implementation, the assignment will only take place in the first index of value list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will rollback the var.http_content_type changes, since kong.request.get_header will only get the first value no matter content-type is multi value or not from downstream; but var.http_content_type rely on the assumption that downstream send single content-type or content-type is set by ngx.req.set_header.

@@ -244,7 +244,7 @@ local function new(self, major_version)
error("header name must be a string", 2)
end

local header_value = _RESPONSE.get_headers()[name]
local header_value = ngx.header[name]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we keep the original code here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ngx.header is the better way to get a single response header, and it returns the same type of values as get_headers()[name]: if get_headers()[name] get a table, ngx.header will get a table similarly(unlike ngx.var.sent_http_ which returns comma separated string). So we make this change for better performance.

kong/plugins/aws-lambda/request-util.lua Outdated Show resolved Hide resolved
@@ -89,7 +89,7 @@ local function configure_origin(conf, header_filter)
end
end

local req_origin = kong.request.get_header("origin")
local req_origin = headers["origin"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you confirm it is ok?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a tricky point, here we want to use headers result as param to reduce duplicated get_header call. Although origin header should not be a multiple value logically, but users may still send http http://xxx/yyy Origin:o1 Origin:o2 from downstream which doesn't quite make sense as I see(also I feel like two Content-Type header in one http request is also not quite logically correct...).
But anyway in this case kong.request.get_header("origin") will get only first value o1 and get_headers()["origin"] will get {o1, o2}.
So the get_header API guarantees we only get a single value no matter it's multiple value or not. But in the other side, its pointless to run get_header multiple times since the header has not been changed. Maybe we could just change this to:
type(headers["origin"]) == "table" and headers["origin"][1] or headers["origin"]
which is the implementation to make header value always single(see https://github.com/Kong/kong/blob/master/kong/tools/http.lua#L564).
@bungle @chronolaw

@@ -181,7 +181,7 @@ local function configure_credentials(conf, allow_all, header_filter)

-- Access-Control-Allow-Origin is '*', must change it because ACAC cannot
-- be 'true' if ACAO is '*'.
local req_origin = kong.request.get_header("origin")
local req_origin = headers["origin"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you confirm it is ok?

if kong.request.get_method() ~= "OPTIONS"
or not kong.request.get_header("Origin")
or not kong.request.get_header("Access-Control-Request-Method")
or not headers["Origin"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you confirm it is ok?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment