Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async Signals #1043

Merged
merged 3 commits into from
Mar 11, 2025
Merged

Async Signals #1043

merged 3 commits into from
Mar 11, 2025

Conversation

TitanNano
Copy link
Contributor

@TitanNano TitanNano commented Feb 9, 2025

This has been developed last year in #261 and consists of two somewhat independent parts:

  • A Future for Signal: an implementation of the Future trait for Godots signals.
  • Async runtime for Godot: a wrapper around Godots deferred code execution that acts as a async runtime for rust futures.

The SignalFuture does not depend on the async runtime and vice versa, but there is no point in having a future without a way to execute it.

For limitations see: #261 (comment)

Example

let node = Node::new_gd();

// spawn a new async task
godot_task(async move {
    // do something before waiting for a signal
    let children = node.get_children();
    
    // await a signal
    let _: () = Signal::from_object_signal(&node, "tree_entered").to_future().await;

    // do more after the signal
   children.iter_shared().for_each(|child| ... );
});

TODOs

  • Decide if we want to keep the GuaranteedSignalFuture. Should it be the default? (We keep it as TrySignalFuture, the plain signal is a wrapper that panics in the error case.)
  • Documentation
  • figure out async testing.
  • deal with async panics (in tests)

CC @jrb0001 because they provided very valuable feedback while refining the POC.
Closes #261

@GodotRust
Copy link

API docs are being generated and will be shortly available at: https://godot-rust.github.io/docs/gdext/pr-1043

Copy link
Member

@Bromeon Bromeon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot, this is very cool!

From the title I was first worried this might cause many conflicts with #1000, but it seems like it's mostly orthogonal, which is nice 🙂

I have only seen the first 1-2 files, will review more at a later point. Is there maybe an example, or should we just check tests?

@TitanNano TitanNano mentioned this pull request Feb 10, 2025
@TitanNano TitanNano force-pushed the jovan/async_rt branch 3 times, most recently from 2877010 to 9687f3b Compare February 10, 2025 21:19
@Bromeon Bromeon added the feature Adds functionality to the library label Feb 10, 2025
@jrb0001
Copy link
Contributor

jrb0001 commented Feb 10, 2025

I am currently testing it with my project.

  • Executor from this PR and signals from my old implementation (based on async_channel) seems to work ingame.
  • My old executor (based on async_task running once per frame) and signals from this PR is the next step, hopefully tomorrow.
  • Both Executor and signals from this PR will come after that. I expect some issues with recursive signals but let's see.
  • I am getting a weird segfault on hotreloading with a completely useless backtrace which didn't happen with my executor implementation. I need to debug this more, but I suspect it is related to having a tool node spawn a future which listens on its signals and/or a signal>drop>signal>drop something else>signal chain.

@lilizoey
Copy link
Member

* I am getting a weird segfault on hotreloading with a completely useless backtrace which didn't happen with my executor implementation. I need to debug this more, but I suspect it is related to having a tool node spawn a future which listens on its signals and/or a signal>drop>signal>drop something else>signal chain.

i'd guess it's related to using thread_local here which we need to do some hacky stuff to support with hot-reloading enabled

@TitanNano
Copy link
Contributor Author

i'd guess it's related to using thread_local here which we need to do some hacky stuff to support with hot-reloading enabled

Shouldn't the hot-reload hack only leak memory? 🤔

@jrb0001 does the segfault occur on every hot-reload?

@jrb0001
Copy link
Contributor

jrb0001 commented Feb 11, 2025

i'd guess it's related to using thread_local here which we need to do some hacky stuff to support with hot-reloading enabled

Shouldn't the hot-reload hack only leak memory? 🤔

@jrb0001 does the segfault occur on every hot-reload?

I am not completely sure yet. It doesn't happen if there are no open scenes or if none of them contains a node which spawns a Future.

It also doesn't seem to happen every single time if I close all scenes and then open one with a Future before triggering the hot-reload. In this case it panics with some scenes:

ERROR: godot-rust function call failed: <Callable>::GodotWaker::wake()
    Reason: [panic]  Future no longer exists when waking it! This is a bug!
  at /home/jrb0001/.cargo/git/checkouts/gdext-3ec94bd991a90eb6/2877010/godot-core/src/builtin/async_runtime.rs:271

With another scene it segfaults in this scenario.

Simply reopening the editor (same scene gets opened automatically) and then triggering a hot-reload segfaults for both scenes.

With both executor + Future from this PR, the hot-reload issue doesn't happen at all?!? So the issue could also be in my code, let me debug it properly before you waste more time on it.

I will do some more debugging later this week (probably weekend).


I also finished testing the Future part of the PR and it works fine with both my old executor and your executor in my relatively simple usage.

Unfortunately all my complex usages (recursion, dropping, etc.) need a futures_lite::Stream which I can't implement on top of your GuaranteedSignalFuture without potentially missing (or duplicating?) some signals while reconnecting with a new Future instance.

The R: Debug bound on to_future()/to_guaranteed_future() was a bit annoying and doesn't seem to be used? Or did I miss something?

@TitanNano
Copy link
Contributor Author

The R: Debug bound on to_future()/to_guaranteed_future() was a bit annoying and doesn't seem to be used? Or did I miss something?

Yeah, it's completely unnecessary now. Probably an old artifact. I removed the bound.


Unfortunately all my complex usages (recursion, dropping, etc.) need a futures_lite::Stream which I can't implement on top of your GuaranteedSignalFuture without potentially missing (or duplicating?) some signals while reconnecting with a new Future instance.

Can you elaborate what the issue here is?


I'm also curious what your use-case for the GuaranteedSignalFuture is. Currently, I'm still thinking to get rid of it again. I have never come across a future that resolves when the underlying source disappears, and I wonder if it is really that useful for most users. But maybe you can share how it's important for you.

@TitanNano
Copy link
Contributor Author

TitanNano commented Feb 12, 2025

ERROR: godot-rust function call failed: <Callable>::GodotWaker::wake()
    Reason: [panic]  Future no longer exists when waking it! This is a bug!
  at /home/jrb0001/.cargo/git/checkouts/gdext-3ec94bd991a90eb6/2877010/godot-core/src/builtin/async_runtime.rs:271

@jrb0001 Do you have an idea what could have triggered this? The only thing that I can think of is that a waker got cloned and reused after the future resolved. The panic probably doesn't make any sense, since the waker can technically be called an infinite number of times. 🤔

@TitanNano TitanNano force-pushed the jovan/async_rt branch 2 times, most recently from 071c97e to c58b657 Compare February 14, 2025 23:47
@TitanNano
Copy link
Contributor Author

@Bromeon I now added a way to test async tasks. I still need to deal with panics inside a Future, though. Technically, we could unify the test execution of sync and async tasks, but I get the impression that it also would have some downsides. Keeping it separate adds a bit of duplication, but unifying it would force more complexity onto the execution of sync tasks.

Copy link
Member

@Bromeon Bromeon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've finally had some time to look more closely at this. Thanks so much for this great PR, outstanding work as always ❤️

Technically, we could unify the test execution of sync and async tasks, but I get the impression that it also would have some downsides. Keeping it separate adds a bit of duplication, but unifying it would force more complexity onto the execution of sync tasks.

I think you made the right choice here, it seems they're different enough to be treated differently. If it becomes bothersome in the future, we could always revise that decision; but I think keeping the sync tests simple is a good approach.

@TitanNano TitanNano force-pushed the jovan/async_rt branch 2 times, most recently from 20a53b7 to af7d58b Compare February 15, 2025 14:32
@jrb0001
Copy link
Contributor

jrb0001 commented Feb 16, 2025

I'm also curious what your use-case for the GuaranteedSignalFuture is. Currently, I'm still thinking to get rid of it again. I have never come across a future that resolves when the underlying source disappears, and I wonder if it is really that useful for most users. But maybe you can share how it's important for you.

My experience seems to be the exact opposite of yours. Usually things like sockets and channels return Err/None/panic when the other side disappears. I don't think I have ever encountered a Future that gets stuck intentionally.

With Godot this isn't only caused by intentionally disconnecting a signal, but also when a node is freed, which can happen at any time and on a large scale. I don't like the idea of having hundreds or maybe even thousands of stuck tasks after the player changed scenes a few times.

I also think we shouldn't compare it to gdscript, for two reasons:

  • gdscript doesn't need to store any additional state so it doesn't have a memory leak. Your runtime "leaks" memory through the thread-local if a task gets stuck.
  • Not sure how to explain this, but for me the direction behind them is different. gdscript (and rust with Callable) is "Godot should call this method when ..." (Godot is the owner / pushing) while Future is "My future should wait until ..." (Future/Runtime is the owner / pulling). The Callable approach can detect the disconnect through NOTIFICATION_PREDELETE (gdscript) or drop() (Rust Callable) while the latter completely depends on the behavior of the signal future.

Your SignalFuture is usually enough and more ergonomic than the GuaranteedSignalFuture but I would make it panic on disconnect and make the Runtime clear the task on panic. The GuaranteedSignalFuture is still helpful if you need to wait for some signal and detect when the source disappears at the same time, without combining multiple signals, relying on catch_unwind() or a custom Drop impl.

I unfortunately didn't get to do my debugging session due to sickness. I will let you know once I have some results, but that will most likely be towards the end of the week or even weekend.

@Bromeon
Copy link
Member

Bromeon commented Feb 16, 2025

Thanks a lot for the detailed insights, @jrb0001 👍

I'm trying to see it from a user perspective. A user would then have to make a choice whether the basic future is enough or the guaranteed one is needed, which may be... not a great abstraction?

How would you advise a library user to choose correctly here, without needing to know all the details? Does the choice even make sense, or should we sacrifice a bit of ergonomics for correctness?

@TitanNano
Copy link
Contributor Author

My experience seems to be the exact opposite of yours. Usually things like sockets and channels return Err/None/panic when the other side disappears. I don't think I have ever encountered a Future that gets stuck intentionally.

I get this point, but I wouldn't say the future gets stuck intentionally. If you create a Godot Object and don't free it, then it leaks memory. That is also not intentional. From my point of view, async tasks must be stored and canceled before freeing the Object, this is simply an inherited requirement from the manually managed Node / Object. We can put this into the documentation of the TaskHandle. Maybe we also want to make the TaskHandle #[must_use]?

I also think making the SignalFuture panic if it's Callable gets dropped would be a good compromise. This would highlight that something unexpected is happening.

@TitanNano TitanNano force-pushed the jovan/async_rt branch 3 times, most recently from 43b167c to 766bc95 Compare February 16, 2025 23:03
@Dheatly23
Copy link
Contributor

Dheatly23 commented Feb 17, 2025

I get this point, but I wouldn't say the future gets stuck intentionally. If you create a Godot Object and don't free it, then it leaks memory. That is also not intentional. From my point of view, async tasks must be stored and canceled before freeing the Object, this is simply an inherited requirement from the manually managed Node / Object. We can put this into the documentation of the TaskHandle. Maybe we also want to make the TaskHandle #[must_use]?

But isn't manually cancelling TaskHandle is too much of a chore? Consider this simple GDScript example:

extends Button

func _pressed():
    await get_tree().create_timer(1.0).timeout
    print("Pressed one second before!")

If the button got freed, the call simply drops without any cleanup code. But with your proposal we need to store all TaskHandle in the node and cancel them all on exit tree, am i right?

Small nitpick, but i disagree on naming it GuaranteedSignalFuture, it give impression that the future will resolve without errors. I suggest naming it TrySignalFuture to emphasize that the signal might never resolve (eg. the node is removed). My potential use case is for asynchronous task cleanup like sending final message or waiting/selecting on multiple signals.

@Bromeon
Copy link
Member

Bromeon commented Feb 17, 2025

From the discussion, it's stated that the "guaranteed" future is less ergonomic to use than the regular one. At the same time, it seems like the regular one needs manual cleanup (thus being less ergonomic in its own way).

To be on the same page, could someone post similar usage examples for each of them? 🙂

@AsbjornOlling
Copy link

AsbjornOlling commented Mar 10, 2025

A caveat of this solution (that I didn't see coming), is that it seems to not work with message passing from an OS thread.

E.g. with code like this:

let (tx, mut rx) = tokio::sync::mpsc::channel(1024);
tx.blocking_send("Hello from the main thread".to_string()).unwrap();

std::thread::spawn(move || {
    tx.blocking_send("Hello from a worker thread!".to_string()).unwrap();
});

godot::task::spawn(async move {
    while let Some(text) = rx.recv().await {
        debug!("Received: {text:?}");
    }
});

The first message comes though perfectly, but the second message (the one sent from a thread) produces this error:

ERROR: godot-rust function call failed: <Callable>::GodotWaker::wake()
    Reason: [panic]
  Callable 'GodotWaker::wake' created with from_local_fn() must be called from the same thread it was created in.
  If you need to call it from any thread, use from_sync_fn() instead (requires `experimental-threads` feature).
  at /home/asbjorn/.cargo/git/checkouts/gdext-c10fa258906ea04a/e5b215b/godot-core/src/builtin/callable.rs:585
   at: godot_core::private::report_call_error (/home/asbjorn/.cargo/git/checkouts/gdext-c10fa258906ea04a/e5b215b/godot-core/src/private.rs:335)
ERROR: Error calling deferred method: '': .
   at: _call_function (core/object/message_queue.cpp:222)

It might be obvious to someone more familiar with async rust than me; but I at least found it surprising.

It seems like it's currently not possible to communicate between OS threads and async tasks in the godot runtime?

Foldable: Some more context about my use-case for an async runtime in godot

I thought that an async runtime in godot would be perfect for my usecase, since my gdextension relies on waiting for some cpu-heavy work that happens on a separate work thread. It takes way longer to process one item than I can tolerate blocking rendering for, so the heavy to happen on a dedicated thread (and not a godot task).

My currently working solution regularly polls the output queues with try_recv() on _physics_process, and that seems to work fine. But once stuff gets a bit more stateful, it can quickly turn into some kind-of-complicated state machines.

Being able to write a long-running async task that waits for results from the cpu-bound thread, and then does the interactions with godot that need to happen on the main thread, would be really useful for me.

@TitanNano
Copy link
Contributor Author

@AsbjornOlling this should be possible, but you have to enable the experimental-threads feature. This feature is off by default.

@AsbjornOlling
Copy link

@AsbjornOlling this should be possible, but you have to enable the experimental-threads feature. This feature is off by default.

I have experimental-threads enabled.

If I run the same code from my example above without experimental-threads, I get this error instead:

attempted to access binding from different thread than main thread; this is UB - use the "experimental-threads" feature.
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

@AsbjornOlling
Copy link

Nice! This seems to work ❤️

@TitanNano
Copy link
Contributor Author

@AsbjornOlling thanks for pointing this out. It was indeed not working as intended.

@ColinWttt
Copy link
Contributor

ColinWttt commented Mar 11, 2025

It seems that Gd<T> cannot be passed as a parameter in type-safe signals, due to Gd not being Send + Sync.

        let task_handle = task::spawn(async move {
            let (ret,) = self.signals().active_card_ability().deref().to_future().await;
        });
error[E0599]: the method `to_future` exists for reference `&TypedSignal<'_, CardManager, (Gd<AbilityContext>,)>`, but its trait bounds were not satisfied
   --> src\class\card_manager.rs:211:71
    |
211 |             let (ret,) = self.signals().active_card_ability().deref().to_future().await;
    |                                                                       ^^^^^^^^^ method cannot be called due to unsatisfied trait bounds
    |
   ::: ..\.rustup\toolchains\stable-x86_64-pc-windows-msvc\lib/rustlib/src/rust\library\core\src\cell.rs:311:1
    |
311 | pub struct Cell<T: ?Sized> {
    | -------------------------- doesn't satisfy `Cell<*mut __GdextClassInstance>: Sync`
    |
    = note: the following trait bounds were not satisfied:
            `*mut card_ability_context::AbilityContext: Sync`
            which is required by `(godot::prelude::Gd<card_ability_context::AbilityContext>,): Sync`
            `Cell<*mut __GdextClassInstance>: Sync`
            which is required by `(godot::prelude::Gd<card_ability_context::AbilityContext>,): Sync`
            `*mut card_ability_context::AbilityContext: Send`
            which is required by `(godot::prelude::Gd<card_ability_context::AbilityContext>,): Send`
            `*mut __GdextClassInstance: Send`
            which is required by `(godot::prelude::Gd<card_ability_context::AbilityContext>,): Send`

@Bromeon
Copy link
Member

Bromeon commented Mar 11, 2025

It seems that Gd<T> cannot be passed as a parameter in type-safe signals, due to Gd not being Send + Sync.

This comes from the Send + Sync bounds here:

pub struct SignalFuture<R: ParamTuple + Sync + Send>(FallibleSignalFuture<R>);

Those bounds are unnecessary for signals that are emitted on the main thread. (Awaiting must anyway happen on the main thread).

Was the intention here to support also signals emitted on other threads, as a cross-thread communication mechanism? If yes, we should probably add this later -- might need more thought regarding thread safety, and probably some version of #18.

@TitanNano
Copy link
Contributor Author

Those bounds are unnecessary for signals that are emitted on the main thread. (Awaiting must anyway happen on the main thread).

Was the intention here to support also signals emitted on other threads, as a cross-thread communication mechanism? If yes, we should probably add this later -- might need more thought regarding thread safety, and probably some version of #18.

Yes, it's basically impossible to tell where a signal will be emitted, since any signal can be emitted from any thread. We also use a RustCallable inside the future, and it has a Send + Sync bound. Making the future Send + Sync is the most reliable solution.

For Gd<T> I was thinking about a follow-up PR where I would like to work out something like the ThreadCrosser in itest but with a runtime check if the value actually was moved between threads. I haven't worked on this yet, though.

Copy link
Member

@Bromeon Bromeon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for this massive feature, @TitanNano!

The current implementation seems solid enough to be merged. Hopefully this allows more people to test it out -- thanks also to everyone who has voiced their concerns in this thread, I encourage you to open new discussions where appropriate 🙂 regarding the Gd issue, I definitely think this should be discussed, but such changes can happen in follow-up PRs.

🚀

@Bromeon Bromeon added this pull request to the merge queue Mar 11, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 11, 2025
@Bromeon Bromeon enabled auto-merge March 11, 2025 21:51
@Bromeon Bromeon added this pull request to the merge queue Mar 11, 2025
Merged via the queue into godot-rust:master with commit 1957e57 Mar 11, 2025
17 checks passed
@TitanNano TitanNano deleted the jovan/async_rt branch March 11, 2025 22:00
@TitanNano
Copy link
Contributor Author

Thanks, everyone, for the great and productive feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Adds functionality to the library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Async/Await for Signals
10 participants