-
-
Notifications
You must be signed in to change notification settings - Fork 289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Race Condition with rendering parts in goroutines with templ-initialized context #977
Comments
A couple of questions:
|
Hi, just looking into this. Wouldn't you also want to put a mutex around the If you did that, I don't think you would have a race condition. The suspense example in the repo uses a So, I don't really understand what you're trying to do. Are you thinking that you could get better performance by rendering multiple parts of a page in parallel, or something else? If it's that you think you'll get better performance by rendering individual sections, then I think you could avoid the race condition by having some HTTP middleware that initializes the context as it receives a request, to enable that, but I suspect that the overhead of creating buffers and copying them around would negate any benefits. |
Hi, thanks for taking a look at it! Lets start with why we are doing this. In our case, we have a lot of heavy snippets rendered in parallel but written sequentially. We saw a 50-60ms saving on paralleling the rendering. Mutex around writer could be added in the async renderer in case of suspence, but for us the order matters. So we do see better performance with rendering in parallel.
Unfortunately, it is exactly what doesn't work here. If the context is initialized, than every goroutine would receive the same initialized version of context with the same pointer to Children. Then each of the parallel goroutines starts reading and cleaning Children in a shared pointer, hence the race condition. What we need is to set a tombstone in the context before handing it over to goroutines and make all the goroutines initialize their context again with their own Children pointer |
So, the objective is to render various sections to sub-buffers, then to render them out in the expected sequence. I can see how that could, in some cases, improve the performance. However, @joerdav's comments on the PR are correct. The context design currently expects single threaded use, and For example: https://github.com/a-h/templ/blob/8e2db03531d9f905c69136dac1b64ee99fc9aa0f/runtime.go#L502C1-L515C2 It expects that it can read and write to the context without a mutex. Lines 53 to 62 in 8e2db03
We could consider updating the way that the context works so that children are separate. This would allow us to have something like this: package main
import (
"bytes"
"context"
"errors"
"io"
"os"
"sync"
"github.com/a-h/templ"
)
func main() {
// Create a new component.
page := Page()
// Render the component.
page.Render(context.Background(), os.Stdout)
}
func Concurrent(components ...templ.Component) templ.Component {
return ConcurrentComponent{Components: components}
}
type ConcurrentComponent struct {
Components []templ.Component
}
func (c ConcurrentComponent) Render(ctx context.Context, w io.Writer) error {
waitGroups := make([]sync.WaitGroup, len(c.Components))
buffers := make([]*bytes.Buffer, len(c.Components))
errs := make([]error, len(c.Components))
// Start rendering.
for i := range c.Components {
waitGroups[i].Add(1)
buffers[i] = new(bytes.Buffer)
go func(i int) {
defer waitGroups[i].Done()
errs[i] = c.Components[i].Render(ctx, buffers[i])
}(i)
}
// Render the results.
for i := range c.Components {
waitGroups[i].Wait()
if errs[i] != nil {
break
}
_, errs[i] = io.Copy(w, buffers[i])
}
return errors.Join(errs...)
} And use it like this: templ Page() {
<!DOCTYPE html>
<html>
<head>
<title>Page</title>
</head>
<body>
<h1>Page</h1>
@Concurrent(ExpensiveChild(1), ExpensiveChild(2), ExpensiveChild(3), ExpensiveChild(4), ExpensiveChild(5), ExpensiveChild(6), ExpensiveChild(7), ExpensiveChild(8), ExpensiveChild(9), ExpensiveChild(10))
</body>
</html>
}
templ ExpensiveChild(i int) {
@templ.Flush() {
<div>
<h2>Expensive Child { strconv.Itoa(i) }</h2>
@ExpensiveComponent()
</div>
}
}
func ExpensiveComponent() templ.Component {
return templ.ComponentFunc(func(ctx context.Context, w io.Writer) error {
time.Sleep(1 * time.Second)
_, err := w.Write([]byte("Expensive Component"))
return err
})
} With the expected output. <!doctype html><html><head><title>Page</title></head><body><h1>Page</h1><div><h2>Expensive Child 1</h2>Expensive Component</div><div><h2>Expensive Child 2</h2>Expensive Component</div><div><h2>Expensive Child 3</h2>Expensive Component</div><div><h2>Expensive Child 4</h2>Expensive Component</div><div><h2>Expensive Child 5</h2>Expensive Component</div><div><h2>Expensive Child 6</h2>Expensive Component</div><div><h2>Expensive Child 7</h2>Expensive Component</div><div><h2>Expensive Child 8</h2>Expensive Component</div><div><h2>Expensive Child 9</h2>Expensive Component</div><div><h2>Expensive Child 10</h2>Expensive Component</div></body></html> Really, we should also adjust the children behaviour. At the moment, you get one child component, but it would probably be nicer syntax to do: @Concurrent() {
<div>Level 1</div>
<div>Level 2</div>
} i.e. we return a slice of components (as a type that implements templ.Component). I've added a draft PR above. Using a There's probably some performance implication to this change. @joerdav - you commented on the PR, just wanted to check you had seen this issue. 😁 Would love your thoughts on it. Might be one to discuss on a call. |
It all looks good to me. I'd expect the performance implications to be minimal and only really come into play in these very niche scenarios where concurrent renders are using the same ctx. A couple of things come to mind, on a bit of a tangent, I do think that children being a
In general though, the addition of the mutex, I can't see any issues here besides one.. this could mean that the rendered result is not going to always be consistend. Imagine you have 2 components Foo and Bar, both render a component which contains a |
Yes, I think the I thought With Good point on the race condition for A concurrent renderer would need to take that into account somehow. |
In the current implementation, there is an assumption that all the parts are rendered in a single thread. Context value for children is modified for the whole context. If snippets need to be rendered in goroutines, it creates a race condition where children go missing or sporadically added
To Reproduce
https://github.com/kanstantsin-chernik/templ/blob/suspense-async-rendered/examples/suspense-async-rendered/main.templ
I also added comments to the generated code
Expected behavior
Each templ component is independent and can be rendered in a goroutine
Screenshots
Logs
templ info
output% templ info
(✓) os [ goos=linux goarch=amd64 ]
(✓) go [ location=/home/user/go-code/bin/go version=go version go1.23.2 X:nocoverageredesign linux/amd64 ]
(✓) gopls [ location=/opt/go/path/bin/gopls version=golang.org/x/tools/gopls v0.14.2
golang.org/x/tools/[email protected] h1:sIw6vjZiuQ9S7s0auUUkHlWgsCkKZFWDHmrge8LYsnc= ]
(✓) templ [ location=/home/user/go-code/bin/templ version=v0.2.778 ]
Desktop (please complete the following information):
templ version
): v0.2.778go version
): go version go1.23.2 X:nocoverageredesign linux/amd64gopls
version (gopls version
):golang.org/x/tools/gopls v0.14.2Additional context
The text was updated successfully, but these errors were encountered: