Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add loom Support. #159

Merged
merged 2 commits into from
Jan 18, 2025
Merged

feat: Add loom Support. #159

merged 2 commits into from
Jan 18, 2025

Conversation

He-Pin
Copy link
Contributor

@He-Pin He-Pin commented Jan 11, 2025

Motivation:
Add Loom support.
refs: #90 , again

Modification:

  1. add handlerExecutor() and some helper methods for virtual threads.
  2. some documents.

Result:
Virtual threads supported.

wrk is needed to run the benchmark

  • miniAppWithSleep 100ms : ↑2300%
  • todoDb: ~↓6%
  • staticFiles: ~↓6%
./mill --no-build-lock benchmark.runBenchmarks

Results of same 4 Carrier/Platform threads:

[1] staticFilesWithLoom result with (platform threads):
[1] Running 30s test @ http://localhost:8080/
[1]   4 threads and 100 connections
[1]   Thread Stats   Avg      Stdev     Max   +/- Stdev
[1]     Latency     1.23ms  436.88us  28.64ms   98.91%
[1]     Req/Sec    20.54k     1.08k   22.69k    93.44%
[1]   2461250 requests in 30.10s, 342.70MB read
[1] Requests/sec:  81766.27
[1] Transfer/sec:     11.38MB
[1] 
[1] staticFilesWithLoom result with (virtual threads):
[1] Running 30s test @ http://localhost:8080/
[1]   4 threads and 100 connections
[1]   Thread Stats   Avg      Stdev     Max   +/- Stdev
[1]     Latency     4.41ms   14.69ms 157.91ms   95.15%
[1]     Req/Sec    19.00k     4.29k   22.09k    88.63%
[1]   2266289 requests in 30.02s, 315.55MB read
[1] Requests/sec:  75488.32
[1] Transfer/sec:     10.51MB
[1] 
[1] todoDbWithLoom result with (platform threads):
[1] Running 30s test @ http://localhost:8080/
[1]   4 threads and 100 connections
[1]   Thread Stats   Avg      Stdev     Max   +/- Stdev
[1]     Latency     1.22ms  172.42us  11.67ms   96.46%
[1]     Req/Sec    20.62k     1.29k   42.72k    93.01%
[1]   2466143 requests in 30.10s, 395.12MB read
[1]   Non-2xx or 3xx responses: 2466143
[1] Requests/sec:  81929.40
[1] Transfer/sec:     13.13MB
[1] 
[1] todoDbWithLoom result with (virtual threads):
[1] Running 30s test @ http://localhost:8080/
[1]   4 threads and 100 connections
[1]   Thread Stats   Avg      Stdev     Max   +/- Stdev
[1]     Latency     3.97ms   13.20ms 160.29ms   95.28%
[1]     Req/Sec    19.21k     3.68k   22.32k    89.41%
[1]   2284539 requests in 30.03s, 366.02MB read
[1]   Non-2xx or 3xx responses: 2284539
[1] Requests/sec:  76072.80
[1] Transfer/sec:     12.19MB
[1] 
[1] minimalApplicationWithLoom result with (platform threads):
[1] Running 30s test @ http://localhost:8080/
[1]   4 threads and 100 connections
[1]   Thread Stats   Avg      Stdev     Max   +/- Stdev
[1]     Latency     1.05s   575.57ms   1.99s    57.89%
[1]     Req/Sec    10.05      3.94    30.00     81.66%
[1]   1152 requests in 30.03s, 172.12KB read
[1]   Socket errors: connect 0, read 0, write 0, timeout 1076
[1] Requests/sec:     38.36
[1] Transfer/sec:      5.73KB
[1] 
[1] 
[1] minimalApplicationWithLoom result with (virtual threads):
[1] Running 30s test @ http://localhost:8080/
[1]   4 threads and 100 connections
[1]   Thread Stats   Avg      Stdev     Max   +/- Stdev
[1]     Latency   106.11ms    2.74ms 126.59ms   77.47%
[1]     Req/Sec   239.19     36.74   252.00     92.68%
[1]   28100 requests in 30.03s, 4.10MB read
[1] Requests/sec:    935.74
[1] Transfer/sec:    139.81KB


Some design choices:

  1. Using MethodHandle/Reflect to make it compile on Java 8 too.
  2. Using MethodHandle to name the virtual threads that are needed, JPMS code can be added to open the can by default, but that is a little over-killed, so better with an explicitly --add-opens java.base/java.lang=ALL-UNNAMED
  3. Add a virtualize or screen method to create a virtual thread executor from a platform thread pool, this is useful, especially if you want to limit the underlying queue size, FJP is unbounded.
  4. Users can override the handleExecutor directly too.

@lolgab
Copy link
Member

lolgab commented Jan 11, 2025

In Mill you use T.env to get always the updated environment. Otherwise you get the environment at the moment of instantiating the server, which is not what you usually want.

@He-Pin He-Pin force-pushed the loom branch 2 times, most recently from 23e03e1 to c6821b6 Compare January 12, 2025 09:27
@He-Pin
Copy link
Contributor Author

He-Pin commented Jan 12, 2025

@lihaoyi I think this pr is ready now, seems the virtual thread only improves the performance when heavy blocking.

@lihaoyi
Copy link
Member

lihaoyi commented Jan 13, 2025

@He-Pin looks good overall, please summarize your benchmark results into a short summary table to include in the documentation with a link back to this PR for people to view the full details

@He-Pin
Copy link
Contributor Author

He-Pin commented Jan 13, 2025

@lihaoyi I have updated with a summary table and added another note about classloading

@lihaoyi
Copy link
Member

lihaoyi commented Jan 13, 2025

Thanks @He-Pin, I'll leave the PR open a few more days to see if anyone else wants to comment before merging

@He-Pin
Copy link
Contributor Author

He-Pin commented Jan 13, 2025

@sorumehta Hi, this pr is ready, would you like to take a look at this, thanks.

@samikrc
Copy link

samikrc commented Jan 13, 2025

@He-Pin Quick question: once this is merged, do virtual threads become the default execution method? Or this happens based on certain condition(s)?

@He-Pin
Copy link
Contributor Author

He-Pin commented Jan 13, 2025

@samikrc thanks for intresting.

Short answer: No.

it will only enabled once

  1. Running with JDK 21 (or Where Virtual Thread is enabled),
  2. With the --add-opens java.base/java.lang=ALL-UNNAMED and -Dcask.virtual-thread.enabled=true, OR
  3. You override the handlerExecutor() and supply it a virtual threads backing executor, eg: Executors.newVirtualThreadPerTaskExecutor(), you can switch the default scheduler with cask.internal.Util.createVirtualThreadExecutor if you like.

Because, with the current implementation of JDK, virtual threads can lead to deadlock if they are not configured properly, we encountered this in production when a virtual thread and platform threads both tried to load the same class ByteBuf, but there is no carrier threads to execute the classloading, because the classloading happens in the CPP Native frame, where a mutex is hold, and the virtual threads can not be unmounted, and then deadlock.

The JVM team at Alibaba helped us with a tweaked AJDK, which will add one more carrier thread for this, and they backported the JDK24 features into AJDK21 too.

So, virtual threads help us reduce 50% ygc and 10 pt CPU usage under heavy load. Our system is fully async, so we can only see these improvements.

For any user who tries to use Virtual threads in production, I would suggest:

  1. try to fix the code where it pins virtual threads first.
  2. do heavy load testing, with some bursts.
  3. Configure a larger max pool size, if needed.

Copy link

@sideeffffect sideeffffect left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to see this PR 😍 Was curious and checked out the code. I've noticed one or two things.

Comment on lines +67 to +68
globalLogger.exception(e)
throw new UnsupportedOperationException("Failed to create newThreadPerTaskExecutor.", e)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exception is both logged and re-thrown.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm throwing it on purpose.

cask/src/cask/internal/Util.scala Show resolved Hide resolved
cask/src/cask/main/Main.scala Outdated Show resolved Hide resolved
@megri
Copy link
Contributor

megri commented Jan 17, 2025

Some design choices:

  1. Using MethodHandle/Reflect to make it compile on Java 8 too.
  2. Using MethodHandle to name the virtual threads that are needed, JPMS code can be added to open the can by default, but that is a little over-killed, so better with an explicitly --add-opens java.base/java.lang=ALL-UNNAMED
  3. Add a virtualize or screen method to create a virtual thread executor from a platform thread pool, this is useful, especially if you want to limit the underlying queue size, FJP is unbounded.
  4. Users can override the handleExecutor directly too.

If I understand this correctly the --add-opens argument is needed due to the reflection done in cask.internal.Util to discern if the program is running on a JVM with virtual threads enabled. What's the benefit of doing this over letting the user override the Executor in code? It seems like letting the user provide the Executor would remove a lot of complicated reflection along with the need for the --add-opens.

Related to this, if the user has specified cask.virtual-thread.enabled (it should perhaps be "virtual-threads instead?), but runs on a JVM prior to virtual threads, the application will silently fall back to the default behaviour. Perhaps it is better to exit the application with an error if the requirements are not met.

@He-Pin
Copy link
Contributor Author

He-Pin commented Jan 17, 2025

@megri , Thanks.

  1. With this, we can switch to a different scheduler or CarrierThread pool, which you can't do before.
  2. All virtual threads are now named with cask—*, which will require a lot of pumping code to get right.
  3. With reflection/methodHandle, you can compile the code with Java 8 and make it work with JDK 8+ without the multi-release jar complication, we are using this like code at work too.
  4. cask.virtual-thread.enabled can be cask.virtual-threads.enabled, will change.
  5. Perhaps it is better to exit the application with an error if the requirements are not met. Not sure about this, but it will work with much less interfering with users.

@sideeffffect
Copy link

@megri you can still override handlerExecutor() and provide a Loom based one, without having to worry about --add-opens (which is quite a problematic ask of our users, especially in production environment).

@He-Pin
Copy link
Contributor Author

He-Pin commented Jan 17, 2025

Yes, you can still override it with your loom-based executor.
@sideeffffect, the problem is very easy to work out, with a JPMSUtils which can do the opening, but I think that's a little overkill and better to open it explicitly.

Thanks @sideeffffect and @megri for the review.

@megri
Copy link
Contributor

megri commented Jan 17, 2025

@megri , Thanks.

  1. With this, we can switch to a different scheduler or CarrierThread pool, which you can't do before.
  2. All virtual threads are now named with cask—*, which will require a lot of pumping code to get right.
  3. With reflection/methodHandle, you can compile the code with Java 8 and make it work with JDK 8+ without the multi-release jar complication, we are using this like code at work too.
  4. cask.virtual-thread.enabled can be cask.virtual-threads.enabled, will change.
  5. Perhaps it is better to exit the application with an error if the requirements are not met. Not sure about this, but it will work with much less interfering with users.

1 to 3: Not sure what you mean here. Let's say we have something like this (simplified)

class Main {
  def threadFactory: ThreadFactory =// users can override this to use virtual threads

  private def namedThreadFactory(name: String, wrappedThreadFactory: ThreadFactory): ThreadFactory = {
    val n = AtomicLong(0)
    runnable => 
      val t = wrappedThreadFactory(runnable)
      t.setName(s"$name-${n.getAndIncrement()}")
      t
  }
  
  def createExecutor(): Executor = { // users can override this to use too if they want different scheduling mechanics
    Executors.newThreadPerTaskExecutor(namedThreadFactory("cask", threadFactory))
  }

  def defaultHandler = new ThreadBlockingHandler(createExecutor(), new Main.DefaultHandler(…))
}

This also compiles on all java versions. Is something missing?

@megri
Copy link
Contributor

megri commented Jan 17, 2025

@megri you can still override handlerExecutor() and provide a Loom based one, without having to worry about --add-opens (which is quite a problematic ask of our users, especially in production environment).

That was my thought as well, so I'm wondering if it makes sense for the default to be to tell users to add those things.

@He-Pin
Copy link
Contributor Author

He-Pin commented Jan 17, 2025

the code needs to works with Java 11 as well , you can took a look at pekko , If zio support loom, I think they need to do the same.

@He-Pin He-Pin force-pushed the loom branch 2 times, most recently from 50bd94f to d6cb0d5 Compare January 17, 2025 14:02
@He-Pin
Copy link
Contributor Author

He-Pin commented Jan 17, 2025

image

This also compiles on all Java versions. Is something missing?

@megri This method is only available on JDK 21.

@megri
Copy link
Contributor

megri commented Jan 17, 2025

@megri

  1. apache/pekko@main/actor/src/main/scala/org/apache/pekko/dispatch/VirtualThreadSupport.scala
  2. eclipse-vertx/vert.x@master/vertx-core/src/main/java/io/vertx/core/impl/VertxImpl.java#L111C1-L125C4

You can see, both using reflection or method handle.

Granted, if the goal is to provide an out of the box no extra code needed solution this is the (only) way to go. If this is what's desired I have no objection, it's just now how I would expect to do it :)

@megri This method is only available on JDK 21.

Good catch, but easily replaceable with a SAM-implementation.

Thank you for your patience explaining the thought process behind your choices.

@He-Pin
Copy link
Contributor Author

He-Pin commented Jan 17, 2025

At $Work, where we provide a big jar to support Java 8/17/21, we are using the same trick. The IDE can't support multi-release jars very well.

You can see in apache/pekko#1724 , that I have to using Reflect, where privateLookup is only available in Java 9.

Good catch, but easily replaceable with a SAM-implementation.

Then it will not be a completed ExecutorService implementation.

@megri
Copy link
Contributor

megri commented Jan 17, 2025

Then it will not be a completed ExecutorService implementation.

No but an Executor: https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/Executor.html

@He-Pin
Copy link
Contributor Author

He-Pin commented Jan 17, 2025

@megri Yes, I think the current Cask may need something like a graceful shutdown; otherwise, an executor API is insufficient.
I have updated the type signature.

@megri
Copy link
Contributor

megri commented Jan 17, 2025

@megri Yes, I think the current Cask may need something like a graceful shutdown; otherwise, an executor API is insufficient. I have updated the type signature.

Nice!

One thing I came to think of while walking home this evening was if the default Undertow workers (threads) need to be adjusted somehow, seeing that the intent is to not use them. They are set as part of the Undertow.Builder process, and may take up a quite significant amount of resources by default: https://github.com/undertow-io/undertow/blob/6ae61c6af88d2a8341922ccd0de98926e8349543/core/src/main/java/io/undertow/Undertow.java#L449

edit: Setting the XnioWorker explicitly during Undertow configuration might also be a cleaner way of integrating the virtual thread executorservice: https://github.com/xnio/xnio/blob/3.x/api/src/main/java/org/xnio/XnioWorker.java#L1185-L1187

@lihaoyi
Copy link
Member

lihaoyi commented Jan 18, 2025

Thanks @megri and @sideeffffect for the review and discussion! @He-Pin just let me know when you're done with this PR and I'm happy to merge this and send you the bounty,

@He-Pin
Copy link
Contributor Author

He-Pin commented Jan 18, 2025

@lihaoyi I think I'm done with this PR, actually this helped me to do more enhancement on Pekko too, thanks. apache/pekko#1724

Another question is, do you plan to switch out Undertown to Netty? I think with your current design, it's possible.

Another feedback is as @megri mentioned, we may need to expose more options to users from Undertown, but that can be done in another PR.

@lihaoyi lihaoyi merged commit 05ca192 into com-lihaoyi:master Jan 18, 2025
8 of 9 checks passed
@lihaoyi
Copy link
Member

lihaoyi commented Jan 18, 2025

Thanks @He-Pin , will transfer the bounty using the same bank details we used for the last one

@He-Pin
Copy link
Contributor Author

He-Pin commented Jan 18, 2025

OK, I have a HK HSBC account now too.

@He-Pin
Copy link
Contributor Author

He-Pin commented Jan 18, 2025

Thanks god, cask is using Java 11, not Java 8.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants