generic sycl: refactor kernel mainloops #2070

t4c1 · 2024-09-02T13:25:34Z

Refactors mainloops in many sycl kernels so that consecutive workitems work on consecutive calculations. This should improve performace on Nvidia and AMD GPUs, as coalesced memory transfers can be used in more cases. The amount of duplicated code is also reduced.

This PR should be easier to review with hiding of whitespace changes, as in many places there is difference only in indentation levels.

src/gpu/generic/sycl/binary_kernels.hpp

mgouicem · 2024-09-03T10:08:40Z

Thanks for the PR. Would you have some performance data you can share?

t4c1 · 2024-09-03T10:11:18Z

No, we did not run any benchmarks. For now this can be taken as improvement in how code is laid out, although it should improve performance as well.

vpirogov · 2024-09-05T15:54:40Z

make test
disable device_cpu
enable device_gpu
enable thr_cuda
enable arch_rtx
enable thr_hip
enable arch_instinct

densamoilov · 2024-09-06T19:19:55Z

src/gpu/generic/sycl/sycl_utils.hpp

@@ -35,6 +37,18 @@ inline bool md_dims_in_range(const dnnl::impl::memory_desc_t *desc) {
    return true;
 }

+inline ::sycl::nd_range<1> get_range(const exec_ctx_t &ctx, int work_amount) {
+    auto eng = dnnl::engine(ctx.stream()->engine(), true);
+    auto device = dnnl::sycl_interop::get_device(eng);


Please don't use public API inside the library, use the internal API.

const auto *sycl_engine_impl = utils::downcast<const xpu::sycl::engine_impl_t *>(eng->impl()); auto device = sycl_engine_impl->device());

densamoilov · 2024-09-19T00:34:13Z

@t4c1 can you please address the conflict that appeared after I merged the cublaslt PR.

t4c1 · 2024-09-19T07:52:43Z

done

t4c1 requested a review from a team as a code owner September 2, 2024 13:25

github-actions bot added the platform:gpu-generic Codeowner: @oneapi-src/onednn-gpu-generic label Sep 2, 2024

t4c1 force-pushed the reorg_mainloop branch 2 times, most recently from fe9cb1d to 6b6d294 Compare September 3, 2024 08:39

mgouicem reviewed Sep 3, 2024

View reviewed changes

src/gpu/generic/sycl/binary_kernels.hpp Outdated Show resolved Hide resolved

t4c1 force-pushed the reorg_mainloop branch from 6b6d294 to d0719bc Compare September 3, 2024 10:30

ShanoToni approved these changes Sep 4, 2024

View reviewed changes

t4c1 force-pushed the reorg_mainloop branch from d0719bc to 5cee3f2 Compare September 6, 2024 09:24

densamoilov reviewed Sep 6, 2024

View reviewed changes

vpirogov added this to the v3.7 milestone Sep 9, 2024

t4c1 force-pushed the reorg_mainloop branch from 5cee3f2 to f1be8f4 Compare September 18, 2024 09:40

kala855 approved these changes Sep 18, 2024

View reviewed changes

densamoilov approved these changes Sep 19, 2024

View reviewed changes

t4c1 force-pushed the reorg_mainloop branch from f1be8f4 to aafa2d8 Compare September 19, 2024 07:38

generic sycl: refactor kernel mainloops

194e8a2

t4c1 force-pushed the reorg_mainloop branch from aafa2d8 to 194e8a2 Compare September 19, 2024 07:52

densamoilov merged commit 7131c1b into oneapi-src:main Sep 20, 2024
11 of 12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

generic sycl: refactor kernel mainloops #2070

generic sycl: refactor kernel mainloops #2070

t4c1 commented Sep 2, 2024 •

edited

Loading

mgouicem commented Sep 3, 2024

t4c1 commented Sep 3, 2024

vpirogov commented Sep 5, 2024

densamoilov Sep 6, 2024

t4c1 Sep 18, 2024

densamoilov commented Sep 19, 2024

t4c1 commented Sep 19, 2024

generic sycl: refactor kernel mainloops #2070

generic sycl: refactor kernel mainloops #2070

Conversation

t4c1 commented Sep 2, 2024 • edited Loading

mgouicem commented Sep 3, 2024

t4c1 commented Sep 3, 2024

vpirogov commented Sep 5, 2024

densamoilov Sep 6, 2024

Choose a reason for hiding this comment

t4c1 Sep 18, 2024

Choose a reason for hiding this comment

densamoilov commented Sep 19, 2024

t4c1 commented Sep 19, 2024

t4c1 commented Sep 2, 2024 •

edited

Loading