Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generic sycl: refactor kernel mainloops #2070

Merged
merged 1 commit into from
Sep 20, 2024

Conversation

t4c1
Copy link
Contributor

@t4c1 t4c1 commented Sep 2, 2024

Refactors mainloops in many sycl kernels so that consecutive workitems work on consecutive calculations. This should improve performace on Nvidia and AMD GPUs, as coalesced memory transfers can be used in more cases. The amount of duplicated code is also reduced.

This PR should be easier to review with hiding of whitespace changes, as in many places there is difference only in indentation levels.

@t4c1 t4c1 requested a review from a team as a code owner September 2, 2024 13:25
@github-actions github-actions bot added the platform:gpu-generic Codeowner: @oneapi-src/onednn-gpu-generic label Sep 2, 2024
@t4c1 t4c1 force-pushed the reorg_mainloop branch 2 times, most recently from fe9cb1d to 6b6d294 Compare September 3, 2024 08:39
@mgouicem
Copy link
Contributor

mgouicem commented Sep 3, 2024

Thanks for the PR. Would you have some performance data you can share?

@t4c1
Copy link
Contributor Author

t4c1 commented Sep 3, 2024

No, we did not run any benchmarks. For now this can be taken as improvement in how code is laid out, although it should improve performance as well.

@vpirogov
Copy link
Member

vpirogov commented Sep 5, 2024

make test
disable device_cpu
enable device_gpu
enable thr_cuda
enable arch_rtx
enable thr_hip
enable arch_instinct

@@ -35,6 +37,18 @@ inline bool md_dims_in_range(const dnnl::impl::memory_desc_t *desc) {
return true;
}

inline ::sycl::nd_range<1> get_range(const exec_ctx_t &ctx, int work_amount) {
auto eng = dnnl::engine(ctx.stream()->engine(), true);
auto device = dnnl::sycl_interop::get_device(eng);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't use public API inside the library, use the internal API.

const auto *sycl_engine_impl = utils::downcast<const xpu::sycl::engine_impl_t *>(eng->impl());
auto device = sycl_engine_impl->device());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@vpirogov vpirogov added this to the v3.7 milestone Sep 9, 2024
@densamoilov
Copy link
Contributor

@t4c1 can you please address the conflict that appeared after I merged the cublaslt PR.

@t4c1
Copy link
Contributor Author

t4c1 commented Sep 19, 2024

done

@densamoilov densamoilov merged commit 7131c1b into oneapi-src:main Sep 20, 2024
11 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
platform:gpu-generic Codeowner: @oneapi-src/onednn-gpu-generic
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants