cache_req_fsm: keep the cache object's Content-Length for HEAD always #4247

nigoroll · 2025-01-02T15:32:20Z

Previously, we would only keep the Content-Length header for HEAD requests on hit-for-miss objects, now we simply keep it always to enable "fallback" caching of HEAD requests.

The added vtc implements the basics of the logic to enable the (reasonable) use case documented in
#2107 (comment) but using Vary instead of cache key modification plus restart.

Fixes #4245

dridi

LGTM, but this was not a thorough review.

bin/varnishd/cache/cache_req_fsm.c

sbraz · 2025-01-03T20:19:45Z

bin/varnishtest/tests/r04245.vtc

+    sub vcl_backend_fetch {
+	if (bereq.http.X-Fetch-Method) {
+	    set bereq.method = bereq.http.X-Fetch-Method;
+	}


In this example configuration, the X-Fetch-Method headers can't be unset here before sending the request to the backend or it breaks the Vary part, right? I don't mind sending an extra header to my backend but it's one thing that differs from the restart-based solution.

Correct. We need to (un)set the header before the cache lookup such that the right variant gets hit, if present. For a miss, the header gets copied to the backend request and, when it completes (after vcl_backend_response {} returns), the header's value gets added to the Vary specification for that cache object.

So, in short, the header needs to be present during cache lookup and at the end of vcl_backend_response {}. For practical reasons, it is also needed to signal the backend side to activate the Vary handling.

With these requirements in mind, we can change the code to not send the header by deleting it in vcl_backend_fetch {} and restoring it in vcl_backend_response {}, but we need a vmod to do so. Here's how the test case adjustment looks like with a taskvar.bool object as a simple marker to activate the vary handling:

diff --git a/bin/varnishtest/tests/r04245.vtc b/bin/varnishtest/tests/r04245.vtc index 27244e053..42982b79a 100644 --- a/bin/varnishtest/tests/r04245.vtc +++ b/bin/varnishtest/tests/r04245.vtc @@ -13,6 +13,12 @@ server s1 { } -start varnish v1 -vcl+backend { + import taskvar; + + sub vcl_init { + new vary_x_fetch_method = taskvar.bool(); + } + sub vcl_recv { if (req.method == "HEAD") { set req.http.X-Fetch-Method = "HEAD"; @@ -24,13 +30,17 @@ varnish v1 -vcl+backend { sub vcl_backend_fetch { if (bereq.http.X-Fetch-Method) { set bereq.method = bereq.http.X-Fetch-Method; + # use marker to avoid sending the header to the backend + unset bereq.http.X-Fetch-Method; + vary_x_fetch_method.set(true); } } sub vcl_backend_response { # NOTE: this use of Vary is specific to this case, it is # usually WRONG to only set Vary for a specific condition - if (bereq.http.X-Fetch-Method) { + if (vary_x_fetch_method.get()) { + set bereq.http.X-Fetch-Method = bereq.method; if (beresp.http.Vary) { set beresp.http.Vary += ", X-Fetch-Method"; } else {

For the purpose within the varnish-cache tree, we only use bundled vmods, so this change can not be applied to the proposed patch.

An even simpler way would be to use bereq.method == "HEAD" as the marker in vcl_backend_response {}, which should be possible if the additional logic is only used for HEAD. That is, it should work exactly as in the test case, but might cause trouble in real world VCL:

diff --git a/bin/varnishtest/tests/r04245.vtc b/bin/varnishtest/tests/r04245.vtc index 27244e053..44edbd5bc 100644 --- a/bin/varnishtest/tests/r04245.vtc +++ b/bin/varnishtest/tests/r04245.vtc @@ -24,13 +24,15 @@ varnish v1 -vcl+backend { sub vcl_backend_fetch { if (bereq.http.X-Fetch-Method) { set bereq.method = bereq.http.X-Fetch-Method; + unset bereq.http.X-Fetch-Method; } } sub vcl_backend_response { # NOTE: this use of Vary is specific to this case, it is # usually WRONG to only set Vary for a specific condition - if (bereq.http.X-Fetch-Method) { + if (bereq.method == "HEAD") { + set bereq.http.X-Fetch-Method = bereq.method; if (beresp.http.Vary) { set beresp.http.Vary += ", X-Fetch-Method"; } else {

nigoroll · 2025-01-06T14:29:27Z

notes from bugwash:

There should be a way for VCL to stop sending C-L.

my own homework:

understand why the current code works for pass

Previously, we would only keep the Content-Length header for HEAD requests on hit-for-miss objects, now we simply keep it always to enable "fallback" caching of HEAD requests. The added vtc implements the basics of the logic to enable the (reasonable) use case documented in varnishcache#2107 (comment) but using Vary instead of cache key modification plus restart. Fixes varnishcache#4245

nigoroll · 2025-02-06T11:00:40Z

homework: why does the current code work?

diff --git a/bin/varnishd/cache/cache_req_fsm.c b/bin/varnishd/cache/cache_req_fsm.c
index bbcb3824f..91ec23780 100644
--- a/bin/varnishd/cache/cache_req_fsm.c
+++ b/bin/varnishd/cache/cache_req_fsm.c
@@ -493,6 +493,7 @@ cnt_transmit(struct worker *wrk, struct req *req)
                         * filters have had a chance to chew on it, but that
                         * would negate the "pass for huge objects" use case.
                         */
+                       VSLb(req->vsl, SLT_Debug, "HEAD with OC_F_HFM");
                } else {
                        http_Unset(req->resp, H_Content_Length);
                        if (req->resp_len >= 0)

$ ./varnishtest -iv tests/b00065.vtc | grep -C 5 'HEAD wi'
**** v1    vsl|       1004 RespHeader      c Via: 1.1 v1 (Varnish/trunk)
**** v1    vsl|       1004 VCL_call        c DELIVER
**** v1    vsl|       1004 VCL_return      c deliver
**** v1    vsl|       1004 Timestamp       c Process: 1738839586.533446 0.014269 0.000088
**** v1    vsl|       1004 Filters         c 
**** v1    vsl|       1004 Debug           c HEAD with OC_F_HFM
**** v1    vsl|       1004 RespHeader      c Connection: keep-alive
**** v1    vsl|       1004 Timestamp       c Resp: 1738839586.533676 0.014500 0.000230
**** v1    vsl|       1004 ReqAcct         c 53 0 53 165 0 165
**** v1    vsl|       1004 End             c 
**** v1    vsl|       1003 SessClose       c REM_CLOSE 0.016

so the answer is: Because we set OC_F_HFM for passes.

nigoroll · 2025-02-06T16:20:45Z

I the following, I refer to a response which, by definition, does not have an HTTP body (CONNECT or HEAD request and any response with a 1xx (Informational), 204 (No Content), or 304 (Not Modified) status code) as without body, and all others as with body, even if the body may be empty.

* There should be a way for VCL to stop sending `C-L`.

I have pondered the how and worked on an implementation, and for now, I am unhappy with what I have: unset resp.body seems wrong. It would mean for responses with body to send a Content-Length: 0, while for responses without body it would mean to clear the Content-Length: 0 header. This really is un-pola.

Coming up with a better alternative is surprisingly hard, because, for most cases, we either create or recreate the Content-Length header when VCL has already finished. Also, trying to find a way to give VCL control over Content-Length gets messy soon because of streaming...

Hence, I lean towards a very simple solution to make the change after VCL has finished:

Add a "removeCL" filter, which

for responses without body just removes Content-Length
for responses with body removes Content-Length and prevents a body to be sent, that is, for chunked encoding, it also sends an end chunk.

This implies that filters will also need to run for responses without a body, but, at least from my perspective, this is already an overdue change for consistency.

The vcl interface would be simple, for example:

sub vcl_deliver {
    set resp.filters += " removeCL";
}

@dridi I guess you might have opinions, do you?

dridi · 2025-02-06T17:05:27Z

Also, trying to find a way to give VCL control over Content-Length gets messy soon because of streaming...

We specifically made content-length and transfer-encoding read-only headers because of their role in HTTP framing (especially HTTP/1.x). So it shouldn't be direct control.

Add a "removeCL" filter

This is interesting, but I don't really understand what you are proposing.

for responses without body just removes Content-Length

for responses with body removes Content-Length and prevents a body to be sent, that is, for chunked encoding, it also sends an end chunk.

In both cases we end up without a body delivery, so this really looks like a case for unset resp.body (consistent with unset bereq.body). It would be both a simpler interface and simpler behavior to explain:

discard the response body if there is one
discard framing headers (both content-length and transfer-encoding)

We may want to also discard content-encoding if there was a body.

This implies that filters will also need to run for responses without a body, but, at least from my perspective, this is already an overdue change for consistency.

No opinion, I haven't given much thought, but I am sensitive to the consistency argument. I'm pretty sure that filters today already fiddle with headers, so having headers-only filters make sense (making the VDP::bytes() and VFP::pull() callbacks optional).

I'm not a big fan of header manipulation in core or VMOD code, but I can several cases in favor of this kind of "rendez-vous point" ability to tweak headers before/after a delivery/fetch. Essentially cases where VCL syntax is too limited.

nigoroll · 2025-02-07T11:45:21Z

Re @dridi

Also, trying to find a way to give VCL control over Content-Length gets messy soon because of streaming...

We specifically made content-length and transfer-encoding read-only headers because of their role in HTTP framing (especially HTTP/1.x). So it shouldn't be direct control.

Yes, I understand what we did and why, but still this might not have been the best solution to the problem.

This ticket is about the response to HEAD requests with respect to the Content-Length returned to clients. We have the following different cases to consider:

Stable cache object with known length: The Content-Length header stored with the cache object might still be wrong, and we need correct it based on the actual length of the cached body data.
Busy object (streaming) with Content-Length: The header might turn out to be wrong, but when VCL runs, it is the best approximation we have.
Busy object with chunked encoding: We have no Content-Length at all.

If we wanted to have a correct Content-Length for the busy cases (2) and (3), we could wait for the object to be completely received, but that would not work with transit_buffer.

So, for the case of the HEAD request, we can send a "probably correct" Content-Length with (1), a "maybe correct" Content-Length with (2), and no Content-Length with (3). Which is what this PR does in its current form.

Now the bugwash decision was that VCL should have a way to prevent sending Content-Length with a response to HEAD, and the current question is HOW.

In order to not repeat myself, please re-read #4247 (comment) with the above in mind. The problem is that unset resp.body is, I think, just wrong for the case of "prevent sending Content-Length in response to HEAD".

Add a "removeCL" filter

This is interesting, but I don't really understand what you are proposing.

for responses without body just removes Content-Length

for responses with body removes Content-Length and prevents a body to be sent, that is, for chunked encoding, it also sends an end chunk.

In both cases we end up without a body delivery, so this really looks like a case for unset resp.body (consistent with unset bereq.body). It would be both a simpler interface and simpler behavior to explain:

It's not the same, because for unset resp.body we would preferably send Content-Length: 0 in response to a GET.

nigoroll · 2025-02-19T14:31:44Z

While @dridi and me should find concensus, I think this PR could be merged. The question how to avoid sending Content-Length is I think separate enough.

nigoroll mentioned this pull request Jan 2, 2025

Content-Length is always 0 for content obtained from a response to a HEAD request #4245

Open

dridi reviewed Jan 2, 2025

View reviewed changes

bin/varnishd/cache/cache_req_fsm.c Outdated Show resolved Hide resolved

nigoroll force-pushed the 4245_head_cl branch from 81816ed to b59f59a Compare January 3, 2025 09:08

sbraz reviewed Jan 3, 2025

View reviewed changes

nigoroll force-pushed the 4245_head_cl branch from b59f59a to 9770c6f Compare February 6, 2025 10:51

nigoroll added the a=need bugwash label Feb 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cache_req_fsm: keep the cache object's Content-Length for HEAD always #4247

cache_req_fsm: keep the cache object's Content-Length for HEAD always #4247

nigoroll commented Jan 2, 2025

dridi left a comment

sbraz Jan 3, 2025

nigoroll Jan 4, 2025

nigoroll commented Jan 6, 2025

nigoroll commented Feb 6, 2025

nigoroll commented Feb 6, 2025

dridi commented Feb 6, 2025

nigoroll commented Feb 7, 2025

nigoroll commented Feb 19, 2025

cache_req_fsm: keep the cache object's Content-Length for HEAD always #4247

Are you sure you want to change the base?

cache_req_fsm: keep the cache object's Content-Length for HEAD always #4247

Conversation

nigoroll commented Jan 2, 2025

dridi left a comment

Choose a reason for hiding this comment

sbraz Jan 3, 2025

Choose a reason for hiding this comment

nigoroll Jan 4, 2025

Choose a reason for hiding this comment

nigoroll commented Jan 6, 2025

nigoroll commented Feb 6, 2025

nigoroll commented Feb 6, 2025

dridi commented Feb 6, 2025

nigoroll commented Feb 7, 2025

nigoroll commented Feb 19, 2025