Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][UR][Graph] Move L0 simultaneous graph synchronization from SYCL-RT to L0 adapter #17734

Open
EwanC opened this issue Mar 31, 2025 · 0 comments
Assignees

Comments

@EwanC
Copy link
Contributor

EwanC commented Mar 31, 2025

In the graph_impl.cpp SYCL-RT code we currently have a special case for the Level Zero backend that blocks on previous executions of a graph - https://github.com/intel/llvm/blob/sycl/sycl/source/detail/graph_impl.cpp#L1005

This code is backend specific so should be moved into the Unified Runtime Level-Zero v1 adapter in order to be conformance with the UR spec wording from #17658. Defining that a UR command-buffer should be able to be submitted while a previous submission is still executing.

Note that the Level-Zero v2 adapter will implemented this behavior in #17709.

@EwanC EwanC self-assigned this Mar 31, 2025
kbenzie pushed a commit that referenced this issue Apr 3, 2025
To support the SYCL-Graph extension on an OpenCL backend, we currently
only require the presence of the `cl_khr_command_buffer` extension. This
PR introduces an extra requirement on the
[CL_COMMAND_BUFFER_SIMULTANEOUS_USE_KHR](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_API.html#CL_COMMAND_BUFFER_SIMULTANEOUS_USE_KHR)
capability being present.

This is based on the [graph execution
wording](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc#765-new-handler-member-functions)
on the definition of `handler::ext_oneapi_graph()` that:

> Only one instance of graph will execute at any time. If graph is
submitted multiple times, dependencies are automatically added by the
runtime to prevent concurrent executions of an identical graph.

Such usage results in multiple calls by the SYCL runtime to
`urEnqueueCommandBufferExp` with the same UR command-buffer and event
dependencies to prevent concurrent execution. Without support for
simultaneous-use the OpenCL adapter code cannot guarantee that the first
command-buffer submission has finished execution before it makes
following `clEnqueueCommandBufferKHR` calls with the `cl_event`
decencies. If the first submission is still executing, then an error
will be reported.

Workarounds like adding blocking host waits to the OpenCL UR adapter are
possible, but requiring simultaneous use reflects the vendor
requirements as they are for the currently implementation. I've tried to
document this all in the UR spec and SYCL-Graph design docs, which also
includes a couple of cleanups I found along the way.

Note that the new CTS test fails for Level-Zero adapter, which I've
created #17734 to resolve.

---------

Co-authored-by: Mikołaj Komar <[email protected]>
kbenzie pushed a commit to oneapi-src/unified-runtime that referenced this issue Apr 3, 2025
To support the SYCL-Graph extension on an OpenCL backend, we currently
only require the presence of the `cl_khr_command_buffer` extension. This
PR introduces an extra requirement on the
[CL_COMMAND_BUFFER_SIMULTANEOUS_USE_KHR](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_API.html#CL_COMMAND_BUFFER_SIMULTANEOUS_USE_KHR)
capability being present.

This is based on the [graph execution
wording](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc#765-new-handler-member-functions)
on the definition of `handler::ext_oneapi_graph()` that:

> Only one instance of graph will execute at any time. If graph is
submitted multiple times, dependencies are automatically added by the
runtime to prevent concurrent executions of an identical graph.

Such usage results in multiple calls by the SYCL runtime to
`urEnqueueCommandBufferExp` with the same UR command-buffer and event
dependencies to prevent concurrent execution. Without support for
simultaneous-use the OpenCL adapter code cannot guarantee that the first
command-buffer submission has finished execution before it makes
following `clEnqueueCommandBufferKHR` calls with the `cl_event`
decencies. If the first submission is still executing, then an error
will be reported.

Workarounds like adding blocking host waits to the OpenCL UR adapter are
possible, but requiring simultaneous use reflects the vendor
requirements as they are for the currently implementation. I've tried to
document this all in the UR spec and SYCL-Graph design docs, which also
includes a couple of cleanups I found along the way.

Note that the new CTS test fails for Level-Zero adapter, which I've
created intel/llvm#17734 to resolve.

---------

Co-authored-by: Mikołaj Komar <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant