Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cache_req_fsm: keep the cache object's Content-Length for HEAD always #4247

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

nigoroll
Copy link
Member

@nigoroll nigoroll commented Jan 2, 2025

Previously, we would only keep the Content-Length header for HEAD requests on hit-for-miss objects, now we simply keep it always to enable "fallback" caching of HEAD requests.

The added vtc implements the basics of the logic to enable the (reasonable) use case documented in
#2107 (comment) but using Vary instead of cache key modification plus restart.

Fixes #4245

Copy link
Member

@dridi dridi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but this was not a thorough review.

sub vcl_backend_fetch {
if (bereq.http.X-Fetch-Method) {
set bereq.method = bereq.http.X-Fetch-Method;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this example configuration, the X-Fetch-Method headers can't be unset here before sending the request to the backend or it breaks the Vary part, right? I don't mind sending an extra header to my backend but it's one thing that differs from the restart-based solution.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. We need to (un)set the header before the cache lookup such that the right variant gets hit, if present. For a miss, the header gets copied to the backend request and, when it completes (after vcl_backend_response {} returns), the header's value gets added to the Vary specification for that cache object.

So, in short, the header needs to be present during cache lookup and at the end of vcl_backend_response {}. For practical reasons, it is also needed to signal the backend side to activate the Vary handling.

With these requirements in mind, we can change the code to not send the header by deleting it in vcl_backend_fetch {} and restoring it in vcl_backend_response {}, but we need a vmod to do so. Here's how the test case adjustment looks like with a taskvar.bool object as a simple marker to activate the vary handling:

diff --git a/bin/varnishtest/tests/r04245.vtc b/bin/varnishtest/tests/r04245.vtc
index 27244e053..42982b79a 100644
--- a/bin/varnishtest/tests/r04245.vtc
+++ b/bin/varnishtest/tests/r04245.vtc
@@ -13,6 +13,12 @@ server s1 {
 } -start
 
 varnish v1 -vcl+backend {
+    import taskvar;
+
+    sub vcl_init {
+       new vary_x_fetch_method = taskvar.bool();
+    }
+
     sub vcl_recv {
        if (req.method == "HEAD") {
            set req.http.X-Fetch-Method = "HEAD";
@@ -24,13 +30,17 @@ varnish v1 -vcl+backend {
     sub vcl_backend_fetch {
        if (bereq.http.X-Fetch-Method) {
            set bereq.method = bereq.http.X-Fetch-Method;
+           # use marker to avoid sending the header to the backend
+           unset bereq.http.X-Fetch-Method;
+           vary_x_fetch_method.set(true);
        }
     }
 
     sub vcl_backend_response {
        # NOTE: this use of Vary is specific to this case, it is
        # usually WRONG to only set Vary for a specific condition
-       if (bereq.http.X-Fetch-Method) {
+       if (vary_x_fetch_method.get()) {
+           set bereq.http.X-Fetch-Method = bereq.method;
            if (beresp.http.Vary) {
                set beresp.http.Vary += ", X-Fetch-Method";
            } else {

For the purpose within the varnish-cache tree, we only use bundled vmods, so this change can not be applied to the proposed patch.

An even simpler way would be to use bereq.method == "HEAD" as the marker in vcl_backend_response {}, which should be possible if the additional logic is only used for HEAD. That is, it should work exactly as in the test case, but might cause trouble in real world VCL:

diff --git a/bin/varnishtest/tests/r04245.vtc b/bin/varnishtest/tests/r04245.vtc
index 27244e053..44edbd5bc 100644
--- a/bin/varnishtest/tests/r04245.vtc
+++ b/bin/varnishtest/tests/r04245.vtc
@@ -24,13 +24,15 @@ varnish v1 -vcl+backend {
     sub vcl_backend_fetch {
        if (bereq.http.X-Fetch-Method) {
            set bereq.method = bereq.http.X-Fetch-Method;
+           unset bereq.http.X-Fetch-Method;
        }
     }
 
     sub vcl_backend_response {
        # NOTE: this use of Vary is specific to this case, it is
        # usually WRONG to only set Vary for a specific condition
-       if (bereq.http.X-Fetch-Method) {
+       if (bereq.method == "HEAD") {
+           set bereq.http.X-Fetch-Method = bereq.method;
            if (beresp.http.Vary) {
                set beresp.http.Vary += ", X-Fetch-Method";
            } else {

@nigoroll
Copy link
Member Author

nigoroll commented Jan 6, 2025

notes from bugwash:

  • There should be a way for VCL to stop sending C-L.

my own homework:

  • understand why the current code works for pass

Previously, we would only keep the Content-Length header for HEAD requests on
hit-for-miss objects, now we simply keep it always to enable "fallback" caching
of HEAD requests.

The added vtc implements the basics of the logic to enable the (reasonable) use
case documented in
varnishcache#2107 (comment)
but using Vary instead of cache key modification plus restart.

Fixes varnishcache#4245
@nigoroll
Copy link
Member Author

nigoroll commented Feb 6, 2025

homework: why does the current code work?

diff --git a/bin/varnishd/cache/cache_req_fsm.c b/bin/varnishd/cache/cache_req_fsm.c
index bbcb3824f..91ec23780 100644
--- a/bin/varnishd/cache/cache_req_fsm.c
+++ b/bin/varnishd/cache/cache_req_fsm.c
@@ -493,6 +493,7 @@ cnt_transmit(struct worker *wrk, struct req *req)
                         * filters have had a chance to chew on it, but that
                         * would negate the "pass for huge objects" use case.
                         */
+                       VSLb(req->vsl, SLT_Debug, "HEAD with OC_F_HFM");
                } else {
                        http_Unset(req->resp, H_Content_Length);
                        if (req->resp_len >= 0)
$ ./varnishtest -iv tests/b00065.vtc | grep -C 5 'HEAD wi'
**** v1    vsl|       1004 RespHeader      c Via: 1.1 v1 (Varnish/trunk)
**** v1    vsl|       1004 VCL_call        c DELIVER
**** v1    vsl|       1004 VCL_return      c deliver
**** v1    vsl|       1004 Timestamp       c Process: 1738839586.533446 0.014269 0.000088
**** v1    vsl|       1004 Filters         c 
**** v1    vsl|       1004 Debug           c HEAD with OC_F_HFM
**** v1    vsl|       1004 RespHeader      c Connection: keep-alive
**** v1    vsl|       1004 Timestamp       c Resp: 1738839586.533676 0.014500 0.000230
**** v1    vsl|       1004 ReqAcct         c 53 0 53 165 0 165
**** v1    vsl|       1004 End             c 
**** v1    vsl|       1003 SessClose       c REM_CLOSE 0.016

so the answer is: Because we set OC_F_HFM for passes.

@nigoroll
Copy link
Member Author

nigoroll commented Feb 6, 2025

I the following, I refer to a response which, by definition, does not have an HTTP body (CONNECT or HEAD request and any response with a 1xx (Informational), 204 (No Content), or 304 (Not Modified) status code) as without body, and all others as with body, even if the body may be empty.

* There should be a way for VCL to stop sending `C-L`.

I have pondered the how and worked on an implementation, and for now, I am unhappy with what I have: unset resp.body seems wrong. It would mean for responses with body to send a Content-Length: 0, while for responses without body it would mean to clear the Content-Length: 0 header. This really is un-pola.

Coming up with a better alternative is surprisingly hard, because, for most cases, we either create or recreate the Content-Length header when VCL has already finished. Also, trying to find a way to give VCL control over Content-Length gets messy soon because of streaming...

Hence, I lean towards a very simple solution to make the change after VCL has finished:

Add a "removeCL" filter, which

  • for responses without body just removes Content-Length
  • for responses with body removes Content-Length and prevents a body to be sent, that is, for chunked encoding, it also sends an end chunk.

This implies that filters will also need to run for responses without a body, but, at least from my perspective, this is already an overdue change for consistency.

The vcl interface would be simple, for example:

sub vcl_deliver {
    set resp.filters += " removeCL";
}

@dridi I guess you might have opinions, do you?

@dridi
Copy link
Member

dridi commented Feb 6, 2025

Also, trying to find a way to give VCL control over Content-Length gets messy soon because of streaming...

We specifically made content-length and transfer-encoding read-only headers because of their role in HTTP framing (especially HTTP/1.x). So it shouldn't be direct control.

Add a "removeCL" filter

This is interesting, but I don't really understand what you are proposing.

  • for responses without body just removes Content-Length

  • for responses with body removes Content-Length and prevents a body to be sent, that is, for chunked encoding, it also sends an end chunk.

In both cases we end up without a body delivery, so this really looks like a case for unset resp.body (consistent with unset bereq.body). It would be both a simpler interface and simpler behavior to explain:

  • discard the response body if there is one
  • discard framing headers (both content-length and transfer-encoding)

We may want to also discard content-encoding if there was a body.

This implies that filters will also need to run for responses without a body, but, at least from my perspective, this is already an overdue change for consistency.

No opinion, I haven't given much thought, but I am sensitive to the consistency argument. I'm pretty sure that filters today already fiddle with headers, so having headers-only filters make sense (making the VDP::bytes() and VFP::pull() callbacks optional).

I'm not a big fan of header manipulation in core or VMOD code, but I can several cases in favor of this kind of "rendez-vous point" ability to tweak headers before/after a delivery/fetch. Essentially cases where VCL syntax is too limited.

@nigoroll
Copy link
Member Author

nigoroll commented Feb 7, 2025

Re @dridi

Also, trying to find a way to give VCL control over Content-Length gets messy soon because of streaming...

We specifically made content-length and transfer-encoding read-only headers because of their role in HTTP framing (especially HTTP/1.x). So it shouldn't be direct control.

Yes, I understand what we did and why, but still this might not have been the best solution to the problem.

This ticket is about the response to HEAD requests with respect to the Content-Length returned to clients. We have the following different cases to consider:

  1. Stable cache object with known length: The Content-Length header stored with the cache object might still be wrong, and we need correct it based on the actual length of the cached body data.

  2. Busy object (streaming) with Content-Length: The header might turn out to be wrong, but when VCL runs, it is the best approximation we have.

  3. Busy object with chunked encoding: We have no Content-Length at all.

If we wanted to have a correct Content-Length for the busy cases (2) and (3), we could wait for the object to be completely received, but that would not work with transit_buffer.

So, for the case of the HEAD request, we can send a "probably correct" Content-Length with (1), a "maybe correct" Content-Length with (2), and no Content-Length with (3). Which is what this PR does in its current form.

Now the bugwash decision was that VCL should have a way to prevent sending Content-Length with a response to HEAD, and the current question is HOW.

In order to not repeat myself, please re-read #4247 (comment) with the above in mind. The problem is that unset resp.body is, I think, just wrong for the case of "prevent sending Content-Length in response to HEAD".

Add a "removeCL" filter

This is interesting, but I don't really understand what you are proposing.

  • for responses without body just removes Content-Length
  • for responses with body removes Content-Length and prevents a body to be sent, that is, for chunked encoding, it also sends an end chunk.

In both cases we end up without a body delivery, so this really looks like a case for unset resp.body (consistent with unset bereq.body). It would be both a simpler interface and simpler behavior to explain:

It's not the same, because for unset resp.body we would preferably send Content-Length: 0 in response to a GET.

@nigoroll
Copy link
Member Author

While @dridi and me should find concensus, I think this PR could be merged. The question how to avoid sending Content-Length is I think separate enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Content-Length is always 0 for content obtained from a response to a HEAD request
3 participants