-
Notifications
You must be signed in to change notification settings - Fork 985
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
generic sycl: refactor kernel mainloops #2070
Conversation
fe9cb1d
to
6b6d294
Compare
Thanks for the PR. Would you have some performance data you can share? |
No, we did not run any benchmarks. For now this can be taken as improvement in how code is laid out, although it should improve performance as well. |
make test |
src/gpu/generic/sycl/sycl_utils.hpp
Outdated
@@ -35,6 +37,18 @@ inline bool md_dims_in_range(const dnnl::impl::memory_desc_t *desc) { | |||
return true; | |||
} | |||
|
|||
inline ::sycl::nd_range<1> get_range(const exec_ctx_t &ctx, int work_amount) { | |||
auto eng = dnnl::engine(ctx.stream()->engine(), true); | |||
auto device = dnnl::sycl_interop::get_device(eng); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't use public API inside the library, use the internal API.
const auto *sycl_engine_impl = utils::downcast<const xpu::sycl::engine_impl_t *>(eng->impl());
auto device = sycl_engine_impl->device());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
5cee3f2
to
f1be8f4
Compare
@t4c1 can you please address the conflict that appeared after I merged the cublaslt PR. |
f1be8f4
to
aafa2d8
Compare
aafa2d8
to
194e8a2
Compare
done |
Refactors mainloops in many sycl kernels so that consecutive workitems work on consecutive calculations. This should improve performace on Nvidia and AMD GPUs, as coalesced memory transfers can be used in more cases. The amount of duplicated code is also reduced.
This PR should be easier to review with hiding of whitespace changes, as in many places there is difference only in indentation levels.