Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add EventLoop APIs for simpler scheduling of callbacks #2759

Open
wants to merge 40 commits into
base: main
Choose a base branch
from

Conversation

simonjbeaumont
Copy link
Contributor

@simonjbeaumont simonjbeaumont commented Jun 28, 2024

Motivation

The current scheduleTask APIs make use of both callbacks and promises, which leads to confusing semantics. For example, on cancellation, users are notified in two ways: once via the promise and once via the callback. Additionally the way the API is structured results in unavoidable allocations—for the closures and the promise—which could be avoided if we structured the API differently.

Modifications

This PR introduces new protocol requirements on EventLoop:

protocol EventLoop {
    // ...
    @discardableResult
    func scheduleCallback(at deadline: NIODeadline, handler: some NIOScheduledCallbackHandler) throws -> NIOScheduledCallback

    @discardableResult
    func scheduleCallback(in amount: TimeAmount, handler: some NIOScheduledCallbackHandler) throws -> NIOScheduledCallback

    func cancelScheduledCallback(_ scheduledCallback: NIOScheduledCallback)
}

Default implementations have been provided that call through to EventLoop.scheduleTask(in:_:) to not break existing EventLoop implementations, although this implementation will be (at least) as slow as using scheduleTask(in:_:) directly.

The API is structured to allow for EventLoop implementations to provide a custom implementation, as an optimization point and this PR provides a custom implementation for SelectableEventLoop, so that MultiThreadedEventLoopGroup can benefit from a faster implementation.

Finally, this PR adds benchmarks to measure the performance of setting a simple timer using both scheduleTask(in:_:) and scheduleCallback(in:_:) APIs using a MultiThreadedEventLoopGroup.

Result

A simpler and more coherent API surface.

There is also a small performance benefit for heavy users of this API, e.g. protocols that make extensive use of timers: when using MTELG to repeatedly set a timer with the same handler, switching from scheduleTask(in:_:) to scheduleCallback(in:_:) reduces almost all allocations (and amortizes to zero allocations) and is ~twice as fast.

MTELG.scheduleCallback(in:_:)
╒═══════════════════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╕
│ Metric                │      p0 │     p25 │     p50 │     p75 │     p90 │     p99 │    p100 │ Samples │
╞═══════════════════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╡
│ Malloc (total) *      │       0 │       0 │       0 │       0 │       0 │       0 │       0 │    1109 │
╘═══════════════════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╛

MTELG.scheduleTask(in:_:)
╒═══════════════════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╕
│ Metric                │      p0 │     p25 │     p50 │     p75 │     p90 │     p99 │    p100 │ Samples │
╞═══════════════════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╡
│ Malloc (total) *      │       4 │       4 │       4 │       4 │       4 │       4 │       4 │     576 │
╘═══════════════════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╛

@simonjbeaumont
Copy link
Contributor Author

@swift-server-bot test this please

Sources/NIOCore/EventLoop.swift Outdated Show resolved Hide resolved
Sources/NIOCore/NIOTimer.swift Outdated Show resolved Hide resolved
Sources/NIOCore/EventLoop.swift Outdated Show resolved Hide resolved
Sources/NIOCore/EventLoop.swift Outdated Show resolved Hide resolved
Sources/NIOCore/EventLoop.swift Outdated Show resolved Hide resolved
Sources/NIOCore/NIOTimer.swift Outdated Show resolved Hide resolved
Sources/NIOCore/NIOTimer.swift Outdated Show resolved Hide resolved
Sources/NIOCore/EventLoop.swift Outdated Show resolved Hide resolved
Sources/NIOCore/EventLoop.swift Outdated Show resolved Hide resolved
Sources/NIOCore/NIOTimer.swift Outdated Show resolved Hide resolved
Sources/NIOCore/EventLoop.swift Outdated Show resolved Hide resolved
Sources/NIOCore/EventLoop.swift Outdated Show resolved Hide resolved
Sources/NIOCore/NIOTimer.swift Outdated Show resolved Hide resolved
Sources/NIOCore/NIOTimer.swift Outdated Show resolved Hide resolved
Sources/NIOPosix/SelectableEventLoop.swift Outdated Show resolved Hide resolved
Sources/NIOPosix/SelectableEventLoop.swift Outdated Show resolved Hide resolved
Benchmarks/Benchmarks/NIOPosixBenchmarks/Benchmarks.swift Outdated Show resolved Hide resolved
Sources/NIOCore/NIOTimer.swift Outdated Show resolved Hide resolved

/// Set a timer that will call a handler at the given time.
@discardableResult
func setTimer(for deadline: NIODeadline, _ handler: any NIOTimerHandler) -> NIOTimer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add another bikeshed: is there any reason not to make this method generic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 06a4ce7. Although I didn't see much improvement from it.

Sources/NIOCore/NIOTimer.swift Outdated Show resolved Hide resolved

/// Set a timer that will call a handler after a given amount of time.
@discardableResult
func setTimer(for duration: TimeAmount, _ handler: any NIOTimerHandler) -> NIOTimer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I'm here, can I suggest that we should use different labels instead of for in both? That will help with tab-completion a bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, it felt OK to me. Would you prefer the following?

  • setTimer(for: TimeAmount, ...)
  • setTimer(forDeadline: NIODeadline, ...)

I'm thinking of how I set timers verbally on my phone I normally always use "set a timer for" regardless of whether I then say a duration or an absolute time.

Open to suggestions here, though :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setTimer(for:) and setTimer(at:) is probably the easiest spelling distinction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/apple/swift-nio/pull/2759/files/df3cdb1c1f1f3ce8fb66fd81cdfa0cef3318c984#r1664313726 resulted in new API spellings. But these do use different prepositions for the NIODeadline and TimeAmount variants: scheduleCallback(at:handler:) and scheduleCallback(in:handler:), respectively.

Sources/NIOCore/NIOTimer.swift Outdated Show resolved Hide resolved
Copy link
Contributor Author

@simonjbeaumont simonjbeaumont left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback so far folks. I've addressed it all and attempted to do so in a series of targeted commits that can be squashed on merge.

//cc @Lukasa @glbrntt @FranzBusch


/// Set a timer that will call a handler after a given amount of time.
@discardableResult
func setTimer(for duration: TimeAmount, _ handler: any NIOTimerHandler) -> NIOTimer
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/apple/swift-nio/pull/2759/files/df3cdb1c1f1f3ce8fb66fd81cdfa0cef3318c984#r1664313726 resulted in new API spellings. But these do use different prepositions for the NIODeadline and TimeAmount variants: scheduleCallback(at:handler:) and scheduleCallback(in:handler:), respectively.

Sources/NIOCore/NIOTimer.swift Outdated Show resolved Hide resolved

/// Set a timer that will call a handler at the given time.
@discardableResult
func setTimer(for deadline: NIODeadline, _ handler: any NIOTimerHandler) -> NIOTimer
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 06a4ce7. Although I didn't see much improvement from it.

Sources/NIOCore/NIOTimer.swift Outdated Show resolved Hide resolved
Copy link
Contributor

@glbrntt glbrntt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the method name for NIOScheduledCallbackHandler but beyond that this looks good aside from some nits.

Sources/NIOCore/EventLoop.swift Outdated Show resolved Hide resolved
Sources/NIOCore/EventLoop.swift Outdated Show resolved Hide resolved
Sources/NIOCore/EventLoop.swift Outdated Show resolved Hide resolved
/// This function is called at the scheduled time, unless the scheduled callback is cancelled.
///
/// - Parameter eventLoop: The event loop on which the callback was scheduled.
func onSchedule(eventLoop: some EventLoop)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming reads (to me) like this function is called at the time when the callback is scheduled (when eventLoop.scheduledCallback is called), as opposed to the time when the callback is scheduled to run.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handleScheduledCallback?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the name to handleScheduledCallback in 2ec5543.

Question: do we need to be more considerate about name clashes since we're expecting folks to conform their types to this protocol; e.g. should this be something like handleNIOScheduledCallback?

This case isn't quite covered by https://github.com/apple/swift-nio/blob/main/docs/public-api.md, but it seems similar in nature.

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@glbrntt did you have any thoughts regarding the namespacing or is it fine the way it is?

The function takes a NIO type, so at worst it could cause an overload? Maybe it's OK the way it is now.

Sources/NIOPosix/SelectableEventLoop.swift Outdated Show resolved Hide resolved
Sources/NIOPosix/SelectableEventLoop.swift Outdated Show resolved Hide resolved
Tests/NIOPosixTests/NIOScheduledCallbackTests.swift Outdated Show resolved Hide resolved
Copy link
Contributor Author

@simonjbeaumont simonjbeaumont left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @glbrntt for your latest round of review. I have addressed most of the feedback. I wanted some additional thoughts on these two though:

Tests/NIOPosixTests/NIOScheduledCallbackTests.swift Outdated Show resolved Hide resolved
Sources/NIOPosix/SelectableEventLoop.swift Outdated Show resolved Hide resolved
Sources/NIOCore/EventLoop.swift Outdated Show resolved Hide resolved
Sources/NIOCore/EventLoop.swift Outdated Show resolved Hide resolved
Sources/NIOCore/EventLoop.swift Outdated Show resolved Hide resolved
Sources/NIOPosix/SelectableEventLoop.swift Outdated Show resolved Hide resolved
/// This function is called at the scheduled time, unless the scheduled callback is cancelled.
///
/// - Parameter eventLoop: The event loop on which the callback was scheduled.
func onSchedule(eventLoop: some EventLoop)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the name to handleScheduledCallback in 2ec5543.

Question: do we need to be more considerate about name clashes since we're expecting folks to conform their types to this protocol; e.g. should this be something like handleNIOScheduledCallback?

This case isn't quite covered by https://github.com/apple/swift-nio/blob/main/docs/public-api.md, but it seems similar in nature.

WDYT?

Copy link
Contributor

@glbrntt glbrntt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The try! needs fixing but otherwise I'm happy with this!

Benchmarks/Benchmarks/NIOPosixBenchmarks/Benchmarks.swift Outdated Show resolved Hide resolved
Sources/NIOCore/NIOScheduledCallback.swift Outdated Show resolved Hide resolved
Sources/NIOPosix/SelectableEventLoop.swift Outdated Show resolved Hide resolved
Sources/NIOPosix/MultiThreadedEventLoopGroup.swift Outdated Show resolved Hide resolved
@simonjbeaumont
Copy link
Contributor Author

OK, cancellation support added. FYI @glbrntt @Lukasa @FranzBusch

@simonjbeaumont simonjbeaumont changed the title Add EventLoop APIs for cheaper setting of timers Add EventLoop APIs for cheap scheduling of callbacks Jul 25, 2024
@weissi
Copy link
Member

weissi commented Jul 26, 2024

Motivation

Scheduling a timer is currently an expensive operation because this is done using EventLoop.scheduleTask(in:_:), which results in several allocations. A simple timer use case does not typically need all the functionality of this API and use cases that set many timers (e.g. for timeouts) pay this cost repeatedly.

@simonjbeaumont Could you add information here on why EventLoop.scheduleTask results in more allocations than this? It shouldn't, maybe that's fixable? EventLoop.scheduledTask is meant to kinda be the simplest timer API but the PR contrasts it with "simple timers" which suggests that I'm missing something here.

Also how many allocations are we talking? And do you have something real-worldish where that difference is observable?

@simonjbeaumont
Copy link
Contributor Author

Motivation

Scheduling a timer is currently an expensive operation because this is done using EventLoop.scheduleTask(in:_:), which results in several allocations. A simple timer use case does not typically need all the functionality of this API and use cases that set many timers (e.g. for timeouts) pay this cost repeatedly.

@simonjbeaumont Could you add information here on why EventLoop.scheduleTask results in more allocations than this? It shouldn't, maybe that's fixable? EventLoop.scheduledTask is meant to kinda be the simplest timer API but the PR contrasts it with "simple timers" which suggests that I'm missing something here.

Great questions, and sorry if the write up in the PR description wasn't detailed enough.

IIUC, scheduleTask will allocate a bunch because it takes closures and needs promises. For cases where you just want a callback to an object you own, e.g. for setting repeated timers, this can be wasteful.

Also how many allocations are we talking?

I have added benchmarks and their results to the PR description. We're talking 4 allocations for using scheduleTask and none for scheduleCallback when using SelectableEventLoop.

Note that the default implementation for event loops of scheduleCallback is backed by scheduleTask so there won't be a win here, but it provides an opportunity for event loops to provide a custom implementation, which this PR has for SelectableEventLoop.

And do you have something real-worldish where that difference is observable?

A real-world use case would be something like QUIC, where IIUC it makes use of lots of timers in the protocol. I also think that grpc-swift is likely to benefit from this.

One thing this discussion has prompted me to do is to benchmark repeated scheduleCallback vs using scheduleRepeatedTask to see if there is some amortised benefit to the latter that I've overlooked.

@FranzBusch
Copy link
Member

@simonjbeaumont Could you add information here on why EventLoop.scheduleTask results in more allocations than this? It shouldn't, maybe that's fixable? EventLoop.scheduledTask is meant to kinda be the simplest timer API but the PR contrasts it with "simple timers" which suggests that I'm missing something here.

FWIW, I tried to make EL.scheduleTask as cheap as possible in the past already and tried to minimise allocations. IRCC I hit a point where I couldn't remove any more of the allocations because that would have had public API impact. One of them is for example the future that we allocate which we can completely avoid with this new API.

@Lukasa
Copy link
Contributor

Lukasa commented Jul 26, 2024

Candidly, there's almost no way to make scheduleTask fast in the current API. scheduleTask is required to return a Scheduled<T>, which looks like this:

public struct Scheduled<T> {
@usableFromInline typealias CancelationCallback = @Sendable () -> Void
@usableFromInline let _promise: EventLoopPromise<T>
@usableFromInline let _cancellationTask: CancelationCallback
@inlinable
@preconcurrency
public init(promise: EventLoopPromise<T>, cancellationTask: @escaping @Sendable () -> Void) {
self._promise = promise
self._cancellationTask = cancellationTask
}
/// Try to cancel the execution of the scheduled task.
///
/// Whether this is successful depends on whether the execution of the task already begun.
/// This means that cancellation is not guaranteed.
@inlinable
public func cancel() {
self._promise.fail(EventLoopError._cancelled)
self._cancellationTask()
}
/// Returns the `EventLoopFuture` which will be notified once the execution of the scheduled task completes.
@inlinable
public var futureResult: EventLoopFuture<T> {
self._promise.futureResult
}
}

This has the minimum cost of 1 promise, which is 1 allocation. However, in SelectableEventLoop it requires two allocations, as our cancellationTask closes over self and a taskId, so we have to heap-allocate the closure context:

cancellationTask: {
self._tasksLock.withLock { () -> Void in
self._scheduledTasks.removeFirst(where: { $0.id == taskId })
}
// We don't need to wake up the selector here, the scheduled task will never be picked up. Waking up the
// selector would mean that we may be able to recalculate the shutdown to a later date. The cost of not
// doing the recalculation is one potentially unnecessary wakeup which is exactly what we're
// saving here. So in the worst case, we didn't do a performance optimisation, in the best case, we saved
// one wakeup.
}

But it's worse! The internal datastructure that SelectableEventLoop uses to store the scheduled task is a ScheduledTask (natch). That looks like this:

internal struct ScheduledTask {
/// The id of the scheduled task.
///
/// - Important: This id has two purposes. First, it is used to give this struct an identity so that we can implement ``Equatable``
/// Second, it is used to give the tasks an order which we use to execute them.
/// This means, the ids need to be unique for a given ``SelectableEventLoop`` and they need to be in ascending order.
@usableFromInline
let id: UInt64
let task: () -> Void
private let failFn: (Error) -> Void
@usableFromInline
internal let readyTime: NIODeadline
@usableFromInline
init(id: UInt64, _ task: @escaping () -> Void, _ failFn: @escaping (Error) -> Void, _ time: NIODeadline) {
self.id = id
self.task = task
self.failFn = failFn
self.readyTime = time
}
func fail(_ error: Error) {
failFn(error)
}
}

Note that we have two more closures. How many of them heap allocate? At least one:

{
do {
promise.succeed(try task())
} catch let err {
promise.fail(err)
}
},

As all closures over closures must allocate.

All this scaffolding means that this timer cannot help but cause 3 allocations. None of these can go away without losing the ability to handle cancellation, or breaking the existing API surface: the promise is clearly necessary, the cancel function is API and can't be removed (though we could do a deprecation cycle to remove it, probably), and the mere existence of the promise in alloc 1 forces alloc 3.

Additionally, the nature of .scheduleTask as an API encourages you to take another allocation, because it takes a closure. With careful design you can make that go away (by closing only over a class reference), but most users won't notice this.

@weissi
Copy link
Member

weissi commented Jul 26, 2024

Candidly, there's almost no way to make scheduleTask fast in the current API.

Would it be worth thinking about improving the API with something that is very similar but doesn't suffer all these issues?

I have added benchmarks and their results to the PR description. We're talking 4 allocations for using scheduleTask and none for scheduleCallback when using SelectableEventLoop.

Right, in a naive benchmark, 4 mallocs & 4 frees of allocations that are small and don't last very long is 59 nanoseconds on my machine. Please don't interpret that as saving allocations is not worthwhile, quite the opposite! But inventing duplicate APIs should be well motivated in my opinion.

// run as swiftc -O test.swift && time ./test
import Darwin

@inline(never)
@_optimize(none)
func blackhole(
    _ a: UnsafeMutableRawPointer,
    _ b: UnsafeMutableRawPointer,
    _ c: UnsafeMutableRawPointer,
    _ d: UnsafeMutableRawPointer
) {
    free(a)
    free(b)
    free(c)
    free(d)
}

for _ in 0..<1_000_000_000 {
    guard let a = malloc(24),
    let b = malloc(32),
    let c = malloc(48),
    let d = malloc(48) else {
        preconditionFailure()
    }

    blackhole(a, b, c, d)
}
// $ swiftc -O test.swift && time ./test
//
// real	0m59.349s
// user	0m58.970s
// sys	0m0.360s

A real-world use case would be something like QUIC, where IIUC it makes use of lots of timers in the protocol. I also think that grpc-swift is likely to benefit from this.

I meant a real world use case where the ScheduledTask overhead is large enough that it's actually observable. I understand that QUIC, gRPC and many others use a lot of scheduled tasks, but 59 ns is very very little. Add an order of magnitude, let's call it 500 ns, that's still very little. So we need to be confident that any processing we would do in the timers doesn't totally dwarf the allocation time.

@Lukasa
Copy link
Contributor

Lukasa commented Jul 26, 2024

Why are we adding only one order of magnitude? Each of these timers is kept per-connection, so the correct order of multiplicand is "number of connections / number of cores".

This API can be made very similar to scheduleTask, by having it take a closure instead. In that case, it's an almost perfect replacement, except that it drops the problematic promise. This does open the user up to the sharp edge of the closure context, but expert users can avoid that sharp edge.

But I'd argue this makes your objection worse, not better. That API is much closer to the existing one than the current proposal, which increases confusion instead of decreasing it. We cannot deprecate the existing API, because it's a protocol requirement: event loops must implement it, and indeed it must be used as the fallback implementation for the new API.

@Lukasa
Copy link
Contributor

Lukasa commented Jul 26, 2024

As a related sidebar, these timers are typically set on the event loop, so they are a source of latency across the loop as a whole: we can't do I/O for any time where we're allocating memory for these timers.

@weissi
Copy link
Member

weissi commented Jul 26, 2024

Why are we adding only one order of magnitude? Each of these timers is kept per-connection, so the correct order of multiplicand is "number of connections / number of cores".

I meant adding an order of magnitude for each. In a best-case scenario 4 allocs&frees are 59ns (all the four, not each one). I'm suggesting to assume they 4 allocs & 4 frees come in at 500ns (again, sum of 4 allocs & 4 frees).

This API can be made very similar to scheduleTask, by having it take a closure instead. In that case, it's an almost perfect replacement, except that it drops the problematic promise. This does open the user up to the sharp edge of the closure context, but expert users can avoid that sharp edge.

Yeah, that's what I mean, why not add an API that's the same as the current one except for no ScheduledTask return (and other niche things).

But I'd argue this makes your objection worse, not better. That API is much closer to the existing one than the current proposal, which increases confusion instead of decreasing it. We cannot deprecate the existing API, because it's a protocol requirement: event loops must implement it, and indeed it must be used as the fallback implementation for the new API.

Just so I get it right, your argument is that having a new scheduleTaskFireAndForget(in: TimeAmount, body: @escaping () -> Void) -> Void (name tbd) is confusing? I can sympathise with that but I think we can find a resolution.

My thinking was/is: Let's start with a use case that makes an overhead of 50 to 500ns per schedule actually observable, then let's see what that use case needs. Maybe it needs cancellation support, maybe it doesn't, who knows. It makes it easier to design a new API if we actually now precisely what's necessary.

For example: I mostly do need the returned ScheduledTask such that I'm able to clean up. And I also need the .futureResult usually such that I can schedule the next time (if not expressible by repeated schedule task).

The protocol requirement isn't an issue IMHO. The current API is quite expressive and supports a lot of use cases. But I won't deny that there are use cases where we don't need all features. (I have had those too)

As a related sidebar, these timers are typically set on the event loop, so they are a source of latency across the loop as a whole: we can't do I/O for any time where we're allocating memory for these timers.

Completely. But there's much more overhead than just allocations. Literally the only thing I'm suggesting is a motivation that shows this as an actual win. I know it's a theoretical win but just saving the allocations might be small enough that it's not obversable.

And finally: Something that takes 500ns (which is the 10x'd cost of our allocations) can be done 200M per second per core (but that'd fully consume it). So that's a lot.

@Lukasa
Copy link
Contributor

Lukasa commented Jul 26, 2024

I'm struggling a bit with this feedback, because when I read it it seems like you're asking for two separate things, without clarifying whether they're an AND or an OR.

Yeah, that's what I mean, why not add an API that's the same as the current one except for no ScheduledTask return (and other niche things).

The API is almost identical in function to the current one. It's not fire-and-forget. It's not even that we don't return a ScheduledTask: this API supports cancellation by returning a token, it notifies the callee about that cancellation so we can handle EL quiescing, so it's just as capable of being cancelled as the existing API. The only thing it doesn't have is a Promise.

This is because the Promise on the current one is weird. When we complete or cancel a Scheduled right now, we don't tell you once, we tell you twice: once via Promise and once via callback. That's a very confused idea: why are there two ways? These are much more naturally two APIs: one that fires a callback, and one that takes a promise:

protocol EventLoop {
    func scheduleTask(in: TimeAmount, @Sendable @escaping () -> Void, onCancel: @Sendable @escaping () -> Void) -> TaskHandle

    func scheduleTask(in: TimeAmount, notifying promise: EventLoopPromise<Void>) -> TaskHandle
}

struct TaskHandle {
    func cancel()
}

Of course, each of these APIs could easily be implemented in terms of the other.

But the current API is weirdly both of these at the same time. You pass a callback, but you also get a promise. The two are both notified in the same way. That's weird, and hard to justify.

The new API says "screw it, just do the callback". But it acknowledges that we already have an API and it can do quite a lot, so in an attempt to differentiate itself it becomes far more restrictive. Essentially, it forces the user into a pattern that will allow amortized zero-allocation timers, by ensuring that you always give us something to call rather than us just invoking a closure.

But as you know very well, there is no difference between a delegate and a pair of closures, so we could take the pair of closures instead. The downside is that this API gets very close to obviating ScheduledTask, and it gets quite hard for us to tell users when they should use one or the other. I honestly thought the better thing was to provide a more differentiated API, not a less differentiated one.

While we're here, the 59ns argument isn't interesting because Si has already written a benchmark, which is available in his original post but I will reproduce here:

MTELG.scheduleTask(in:_:)
╒═════════════════════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╕
│ Metric                  │      p0 │     p25 │     p50 │     p75 │     p90 │     p99 │    p100 │ Samples │
╞═════════════════════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╡
│ Malloc (total) *        │       4 │       4 │       4 │       4 │       4 │       4 │       4 │     520 │
├─────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Time (total CPU) (ns) * │     447 │     719 │     773 │     834 │     908 │    1098 │    1348 │     520 │
╘═════════════════════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╛
MTELG.setTimer(for:_:) without NIOCustomTimerImplementation conformance
╒═════════════════════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╕
│ Metric                  │      p0 │     p25 │     p50 │     p75 │     p90 │     p99 │    p100 │ Samples │
╞═════════════════════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╡
│ Malloc (total) *        │       5 │       5 │       5 │       5 │       5 │       5 │       5 │     476 │
├─────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Time (total CPU) (ns) * │     482 │     760 │     823 │     905 │     966 │    1119 │    1364 │     476 │
╘═════════════════════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╛
MTELG.setTimer(for:_:) with NIOCustomTimerImplementation conformance
╒═════════════════════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╕
│ Metric                  │      p0 │     p25 │     p50 │     p75 │     p90 │     p99 │    p100 │ Samples │
╞═════════════════════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╡
│ Malloc (total) *        │       0 │       0 │       0 │       0 │       0 │       0 │       0 │    1071 │
├─────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Time (total CPU) (ns) * │     178 │     278 │     327 │     383 │     434 │     535 │     664 │    1071 │
╘═════════════════════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╛

P50 saves 446ns from a 773ns operation, p90 saves 474ns from a 908ns operation, and p99 saves 563ns from a 1.1us operation.

All of these are things that you can get away with doing millions of times in a second, but all of these are things that are likely to be done many thousands of times per second. As an example, QUIC connections will typically set at least two per-packet timers per connection (idle timer and probe timer). Using ScheduledTask in this context gives little room to move, and the EL offers no other timer solution. If we need a cheaper one, we can work out how to do that, but I don't think we have many ways to get cheaper than the implementation offered here.

@weissi
Copy link
Member

weissi commented Jul 29, 2024

Thanks @Lukasa for writing this. I think this is the missing motivation. I don't think the real motivation are the savings:

P50 saves 446ns from a 773ns operation, p90 saves 474ns from a 908ns operation, and p99 saves 563ns from a 1.1us operation.

That's nice and exactly in line with the prediction of 59ns just for allocs which we upped by 10x (to account for worse cases and the other overhead like initialising the classes in those allocs) to 500ns. Of course that's fantastic to save this but without a use case where that makes even a 1% difference I didn't (and don't) think it's worth to invent new API today before that use case actually is there.

But your post does give the real reason (which is much much better than the performance): The current API is weird and not as performant as it could be. I still think a real world use case motivating doing the change now would be nice (as opposed to when the use case is there which would provide more information). But I think your reasoning is good. My feedback to @simonjbeaumont is then: Copy Cory's post as the motivation (weird API) with the nice side effect: Your programs may also run a tiny bit faster if you schedule loads of timers.

But as you know very well, there is no difference between a delegate and a pair of closures, so we could take the pair of closures instead. The downside is that this API gets very close to obviating ScheduledTask, and it gets quite hard for us to tell users when they should use one or the other. I honestly thought the better thing was to provide a more differentiated API, not a less differentiated one.

I would've thought making it more aligned is better as it makes it easier for people to migrate from weird API to better API but I can see different arguments here.

Last question: Do we think we need to keep the old API (apart for SemVer reasons) or can we make the new API a complete replacement?

@simonjbeaumont
Copy link
Contributor Author

OK, @weissi; I updated the PR description to emphasise the API clarity and demote the performance win as a nice side-effect for the use cases for which it would matter.

@weissi
Copy link
Member

weissi commented Jul 30, 2024

OK, @weissi; I updated the PR description to emphasise the API clarity and demote the performance win as a nice side-effect for the use cases for which it would matter.

Awesome, that makes much more sense to me (maybe adjust the title too).

@simonjbeaumont simonjbeaumont changed the title Add EventLoop APIs for cheap scheduling of callbacks Add EventLoop APIs for simpler scheduling of callbacks Jul 30, 2024
@simonjbeaumont
Copy link
Contributor Author

Looks like @weissi is happy now with the motivation (not sure if that included a code review), and @glbrntt has approved.

@Lukasa, @FranzBusch: this looks like it's waiting for a re-review from one/both of you.

Benchmark(
"MTELG.scheduleTask(in:_:)",
configuration: Benchmark.Configuration(
metrics: [.mallocCountTotal, .instructions],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use our defaultMetrics here? We don't have instruction based benchmarks yet

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ISTR using .instructions because you asked me too, in place of wall clock time. I can use defaultMetrics though, sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

"MTELG.scheduleTask(in:_:)",
configuration: Benchmark.Configuration(
metrics: [.mallocCountTotal, .instructions],
scalingFactor: .kilo
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we align the maximum duration/iterations with #2839

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I gently push back on that ask.

You're asking me to align with a new precedent in an un-merged PR that was opened well after this one, instead of this PR keeping the conventions of the branch it is targeting.

This PR has been subject to a lot of scope creep already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see it's been merged already. So I've now merged this PR with main and updated the benchmarks to use the same configuration as the rest.

/// implicitly, if it was still pending when the event loop was shut down.
///
/// - Parameter eventLoop: The event loop on which the callback was scheduled.
func onCancelScheduledCallback(eventLoop: some EventLoop)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ultra naming nit: Should this be didCancelScheduledCallback? This follows how Swift APIs are often called when something did happen. on is more used when it is about to happen or in the process of happening. e.g. viewDidLoad in UIKit vs onAppear in SwiftUI

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that on is also used for when something has happened but is more ambiguous about whether it's after. IIRC it's will that implies before. E.g. willSet and didSet.

A quick git-grep in the NIO repo shows a use of both on and did quite a bit.

However, it's fair that the only public API uses did:

Sources/NIOCore/AsyncSequences/NIOAsyncSequenceProducerStrategies.swift:37:        public mutating func didYield(bufferDepth: Int) -> Bool {
Sources/NIOCore/AsyncSequences/NIOAsyncSequenceProducerStrategies.swift:42:        public mutating func didConsume(bufferDepth: Int) -> Bool {
Sources/NIOCore/AsyncSequences/NIOAsyncWriter.swift:70:    public func didYield(_ element: Element) {
Sources/NIOPosix/PendingDatagramWritesManager.swift:249:    public mutating func didWrite(
Sources/NIOPosix/PendingWritesManager.swift:201:    public mutating func didWrite(

I can change to using did 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Sources/NIOCore/NIOScheduledCallback.swift Show resolved Hide resolved
///
/// - NOTE: This property is for event loop implementors only.
@inlinable
public var customCallbackID: UInt64? {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: Do we need the custom here in the naming?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it adds value when glancing at it as the property name implies that it's only relevant for custom implementations. How strongly do you feel about it. It's public API so if there's a consensus that this needs a different name I'll suck it up 😄

at deadline: NIODeadline,
handler: some NIOScheduledCallbackHandler
) throws -> NIOScheduledCallback {
let taskID = self.scheduledTaskCounter.loadThenWrappingIncrement(ordering: .relaxed)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something for the future potentially. With this new fast implementation to schedule callbacks it might be worth looking at the normal scheduleTask APIs again to see if we can revamp their internal implementation to avoid some of the allocations that they currently have. We can't get rid of all of them i.e. the returned ELP/ELF but potentially the internal allocs where we close over state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I think @Lukasa's tome comment here will provide good content for someone wanting to see if we can improve it: #2759 (comment).

But as to whether we should, it sounded like the back and forth on the motivation was that the older APIs are a little confusing to hold. Without wanting to open another can of worms, I guess a bigger question is: are there real use cases for scheduleTask that cannot be implemented with scheduleCallback? If the answer is, no, then should we consider deprecating the old one given the discussion about its confusing shape?

Copy link
Contributor Author

@simonjbeaumont simonjbeaumont left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FranzBusch addressed your feedback, thanks!

"MTELG.scheduleTask(in:_:)",
configuration: Benchmark.Configuration(
metrics: [.mallocCountTotal, .instructions],
scalingFactor: .kilo
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see it's been merged already. So I've now merged this PR with main and updated the benchmarks to use the same configuration as the rest.

Benchmark(
"MTELG.scheduleTask(in:_:)",
configuration: Benchmark.Configuration(
metrics: [.mallocCountTotal, .instructions],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 76 to 77
let group = MultiThreadedEventLoopGroup(numberOfThreads: 1)
defer { try! group.syncShutdownGracefully() }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I still disagree with this feedback and think we should change the other benchmarks too for local reasoning reasons, I'm more interested in converging this PR, so I've updated to accommodate this feedback.

/// implicitly, if it was still pending when the event loop was shut down.
///
/// - Parameter eventLoop: The event loop on which the callback was scheduled.
func onCancelScheduledCallback(eventLoop: some EventLoop)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@simonjbeaumont simonjbeaumont added the semver/minor Adds new public API. label Aug 15, 2024
@simonjbeaumont
Copy link
Contributor Author

Gentle ping. 🥺

Copy link
Member

@FranzBusch FranzBusch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM me now. I left one perf suggestion which I think should be noticeable on a micro level. If @Lukasa is also happy with I am happy to merge this

@usableFromInline
enum Kind {
case task(task: () -> Void, failFn: (Error) -> Void)
case callback(any NIOScheduledCallbackHandler)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One performance thought here. Currently we are storing an existential callback handler. However, we could just store the two closures itself which we can get while we are in the generic method. This way we would avoid calling through an existential on every scheduled callback task.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The requirements for protocol NIOScheduledCallbackHandler are generic functions. Is there a way I can store these as closures in the enum associated value, without making the Kind generic?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to add the second half of this. Yes this is why I think we should go back to any EventLoop. I assume the perf benefit outweighs this. This is an assumption but the scheduling and running of tasks is probably hotter than whatever we do in their callback with the passed EL. @Lukasa WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The functions being generic don't really matter here: we can store generic closures in the Kind type without promoting the generic to the Kind type itself. We won't get specialization, but that's fine.

So TL;DR: yes, changing ScheduledEventLoop's representation to a pair of closures is probably the right thing to do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver/minor Adds new public API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants