-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[L0 v2] introduce raii wrapper for UR handles #2552
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem to be necessary to retain context whenever it's used to construct queue, etc. However, the spec is not very clear on this. I'm wondering if we should we update the spec or add retains on context?
hm, I think we should tie lifetime of context to queues/command buffers/programs/pools and all the other entities created from a context. But not kernels or individual enqueues. These retain their parent objects (like e.g., a queue), so the lifetime relationship between, e.g., an enqueue and context is transitive. So it should be safe.
Generally, my opinion is that it should be safe to release the context while there are some pending operations.
@@ -98,13 +98,14 @@ struct ur_event_handle_t_ : _ur_object { | |||
uint64_t getEventEndTimestamp(); | |||
|
|||
protected: | |||
// Do not use ref couting on context to avoid circular dependency. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need something like a weak pointer? You added quite a few comments like that. It would be better to enforce this using a type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I could just do an alias for this, and add some explanation in the comment above that alias definition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion would be to create a weak<> type that has deleted copy constructors/operators and can only be created from a ref-counted ptr (using something like .downgrade() or whatever).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but in some situations there might be no rc<> instance. For example, when you create an eventPool in the context constructor, you need to pass ur_context_handle_t (this) to the eventPool. If it would only accept weak<> and if weak<> can only be created from rc<>, you would need a temporary rc<>. This is a bit verbose but on the other hand, you can use rc_val_only (to do ref count validation) so this could provider us with some more checks.
See the latest commit for how it would look like. Also, I guess if we decide to do that, then probably all handles in v2 should be replaced with rc<> rc_val_only<> or weak<>.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having weak<> could also allow us to check the pointers for NULL (only allow rc<> and weak<> to be != NULL). If we ever need the value to be null (2 places right now), we can use optional (see latest commit) - this can get a bit verbose, but we shouldn't rely on null values too much I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd prefer to have a weak be nullable, like normal std::weak. So that when the original rc pointer goes out of scope, we can detect that.
For example, when you create an eventPool in the context constructor, you need to pass ur_context_handle_t (this) to the eventPool.
That's a scenario for the rc_val_only
type, because context must outlive a queue/command buffer associated with the eventPool. I think weak is for when we create objects that might possibly outlive their "parent" objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a scenario for the rc_val_only type, because context must outlive a queue/command buffer associated with the eventPool. I think weak is for when we create objects that might possibly outlive their "parent" objects.
Yeah, but the problem is that if you use rc_val_only
then it will always fail during release
because of how urContextRelease (and other functions are implemented) - we first decrement the RefCount (to 0) and only then call a destructor of the context (which in turn will destroy the eventPools, etc.).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tbh, I think I'll remove rc_val_only
, as it's only applicable to device and platforms and there's a small chance we will have lifetime issues with those. We could just add some checks to the device/platform classes if we really want to.
Yeah, that makes sense, I think at least CUDA and HIP are doing something similar. I will add this and perhaps some tests for that. |
Some entities (e.g. devices) do not need to be retained as they are owned by the platform. For such cases, only validate RefCount instead of acutally increasing/decreasing it.
66b449b
to
fe4922a
Compare
|
||
// This version of ref_counted validates that the ref count is not zero on every | ||
// release and retain in debug mode, and does nothing in the release mode. | ||
// Used for types that should always be alibe during the adapter lifetime (e.g. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Used for types that should always be alibe during the adapter lifetime (e.g. | |
// Used for types that should always be alive during the adapter lifetime (e.g. |
// Used for types that should always be alibe during the adapter lifetime (e.g. | ||
// devices). | ||
template <typename URHandle> | ||
using rc_val_only = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, I don't like the name :D
durable
, NonNull
, Live
, hm, dunno. Basically what we want to say is that this object's lifetime is guaranteed to exceed that of the containing object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, me too. non_null
/not_null
might good, or maybe checked
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i like checked
.
@@ -104,5 +104,114 @@ using ze_context_handle_t = | |||
using ze_command_list_handle_t = | |||
ze_handle_wrapper<::ze_command_list_handle_t, zeCommandListDestroy>; | |||
|
|||
template <typename URHandle, ur_result_t (*retain)(URHandle), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will be tricky to get right for people.
please create a small writeup in comments on when to use which type of pointer.
Some entities (e.g. devices) do not need to be retained as they are owned by the platform. For such cases, only validate RefCount instead of acutally increasing/decreasing it.
It doesn't seem to be necessary to retain context whenever it's used to construct queue, etc. However, the spec is not very clear on this. I'm wondering if we should we update the spec or add retains on context?
Also, there are a few instances where I think calling retain() might be justified (we are not calling it right now) - e.g. when passing ur_mem_handle_t to urKernelSetArgMemObj / urKernelSetArgSampler.