Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] Enable level zero v2 in DPC++ #16656

Merged
merged 19 commits into from
Feb 28, 2025

Conversation

omarahmed1111
Copy link
Contributor

This makes a simple way to use the experimental level_zero_v2 in dpc++. Fixes #16613 .

@pbalcer
Copy link
Contributor

pbalcer commented Jan 16, 2025

My opinion is that we should build and ship both adapters by default, and make it togglable which one is used at runtime through an env variable (like I suggested here).
Having both is often useful for performance comparisons, or simply to have a fallback if one of the adapters is not behaving correctly.

@omarahmed1111
Copy link
Contributor Author

My opinion is that we should build and ship both adapters by default, and make it togglable which one is used at runtime through an env variable (like I suggested here). Having both is often useful for performance comparisons, or simply to have a fallback if one of the adapters is not behaving correctly.

Good point! changed that.

@jbrodman
Copy link
Contributor

By default, I think we should only build 1 adapter. We don't want to confuse users by enumerating the same device even more times than we already do (OpenCL vs Level Zero). I think it would be nice to support building both and toggling for the reasons stated above, but I'd also like the option to ONLY build v1 or v2.

@pbalcer
Copy link
Contributor

pbalcer commented Jan 17, 2025

By default, I think we should only build 1 adapter. We don't want to confuse users by enumerating the same device even more times than we already do (OpenCL vs Level Zero). I think it would be nice to support building both and toggling for the reasons stated above, but I'd also like the option to ONLY build v1 or v2.

The current proposal, and, from what I can tell, the implementation, is that both adapters will be physically present in the libs directory, however, only one of them will be active at a time. There will be a new environment variable (UR_ADAPTER_LEVEL_ZERO_V2=1, although, long-term, I'd have preferred for this to be an option for ONEAPI_DEVICE_SELECTOR), that users will be able to set to toggle which adapter is used. This ensures that L0 devices are not enumerated twice, but makes it very simple for users to experiment with the new adapter.

Having to build SYCL from scratch with a special option might be challenging for developers of higher-level frameworks or applications that usually use a prebuilt compiler.

@omarahmed1111 omarahmed1111 marked this pull request as ready for review January 17, 2025 15:29
@omarahmed1111 omarahmed1111 requested review from a team as code owners January 17, 2025 15:29
@@ -68,6 +68,7 @@ def do_configure(args):

if sys.platform != "darwin":
sycl_enabled_backends.append("level_zero")
sycl_enabled_backends.append("level_zero_v2")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of comments

  1. Can you explain what L0 V2 is and how it releases to the current L0/L0 adapter

  2. I agree with James, we shouldn't build two plugins by default. If we have some configure switch to build two, that's fine.

Copy link
Contributor

@pbalcer pbalcer Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. README. In short, it's a new L0 adapter with re-implemented performance-critical APIs. See here for performance validation results.
  2. Why though? As I noted, with the current implementation, users won't be affected by the inclusion of the two adapters. And not including it by default would mean that it's going to be much more difficult for application and framework developers to experiment with the new adapter, which is the whole point of including it in SYCL right now in the first place. In practice, it will likely mean we might need to provide the users that want to use it an alternative prebuilt compiler package.

@igchor ping

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why "v2" instead of incremental refactoring?

Copy link
Contributor

@sarnex sarnex Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Got it, thanks

  2. At this point it seems like the v2 adapter is only intended to be used by DPCPP/UR developers, and IMO features like that should not be built by default. If the feature is ready enough for downstream clients, should we switch over by default?

Copy link
Contributor

@pbalcer pbalcer Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why "v2" instead of incremental refactoring?

The existing L0 adapter code base has accumulated a lot of complexity over the years as L0 and the driver evolved. It has become very difficult to change. For example, there are 4 major queue modes that the adapter supports - "batched in order", "batched out of order", "immediate in order", "immediate out of order". In the existing adapter, they are all implemented together, with various conditions sprinkled all throughout to make it work. In practice, batched mode is all but legacy for 99% of scenarios, and "in order" and "out of order" L0 paths are so different (different types of L0 events, different way of synchronization), that there's not much to gain from sharing all the code between them.
Refactoring this incrementally, without breaking existing code, would be challenging. So the team has decided that a partial re-implementation (we still reuse all that's practical) of the performance critical paths is the best path forward. This also means that we have the opportunity to easily leverage new driver features and do a clean break away from the accumulated legacy tech debt.

At this point it seems like the v2 adapter is only intended to be used by DPCPP/UR developers, and IMO features like that should not be built by default. If the feature is ready enough for downstream clients, should we switch over by default?

In terms of quality, the v2 adapter has higher UR CTS passrate than the current one, and passes the vast majority of SYCL e2e tests (big exception being tests that hard-code exact sequence of operations expected from the adapter). However, this is still work-in-progress.

@igchor
Copy link
Member

igchor commented Jan 29, 2025

@omarahmed1111 will this approach enable us to mark specific e2e tests with XFAIL for v2 only, or would that require some extra changes?

@omarahmed1111
Copy link
Contributor Author

@intel/dpcpp-devops-reviewers Gentle ping on this!

@omarahmed1111 omarahmed1111 changed the title Enable level zero v2 in DPC++ [SYCL] Enable level zero v2 in DPC++ Feb 21, 2025
@pbalcer
Copy link
Contributor

pbalcer commented Feb 25, 2025

@intel/dpcpp-devops-reviewers please review. This is blocking our progress on properly enabling v2 in tests and the compiler packages.
Thanks!

Copy link
Contributor

@aelovikov-intel aelovikov-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add links to the in-tree documentation describing rationale and short/mid/long-term plans for this.

@igchor
Copy link
Member

igchor commented Feb 25, 2025

Please add links to the in-tree documentation describing rationale and short/mid/long-term plans for this.

I believe @pbalcer explained the rationale in his previous comment. As for the plans, we want to provide v2 as an experimental adapter in the 2025.2 release to gather feedback from the users and, if needed, add missing features (v2 does not implement support for all env variables that are used to tweak the legacy adapter). In 2026.0, we plan to make the v2 adapter the default one for L0.

I guess we can add this information to https://github.com/intel/llvm/blob/sycl/unified-runtime/source/adapters/level_zero/v2/README.md

@aelovikov-intel
Copy link
Contributor

I believe @pbalcer explained the rationale in his previous comment.

I didn't ask for the explanation itself, but to put the link to it as a comment in the code, so that it would be immediately available for anyone reading the code in future.

Copy link
Contributor

@aelovikov-intel aelovikov-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add links to the in-tree documentation describing rationale and short/mid/long-term plans for this

You've only added the former and not the latter... Also, having links to a comments thread in PR review is weird at the very least.

Anyhow, someone else gave you an approval for the SYCL RT, so I'm only LGTM'ing the configure.py change following those pre-approved changes.

@omarahmed1111
Copy link
Contributor Author

Please add links to the in-tree documentation describing rationale and short/mid/long-term plans for this

You've only added the former and not the latter... Also, having links to a comments thread in PR review is weird at the very least.

Anyhow, someone else gave you an approval for the SYCL RT, so I'm only LGTM'ing the configure.py change following those pre-approved changes.

Apologies, I misunderstood that, I modified the comment to reference the page @igchor mentioned and added a small comment about the timeline. I didn't want to add more than that as I think L0_V2 team could reference this document explanations better. So, I just will leave this for the functionality and this small comment.

@omarahmed1111
Copy link
Contributor Author

@intel/llvm-gatekeepers Please merge, Thanks!

@dm-vodopyanov dm-vodopyanov merged commit cc0b20f into intel:sycl Feb 28, 2025
29 of 30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a cmake switch to build the L0 v2 adapter
10 participants