-
Notifications
You must be signed in to change notification settings - Fork 762
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] Enable mapping of group load/store functions to SPIRV built-ins for local address space #16653
Conversation
…dress space Currently intrinsics for local address space are not supported by cpu/fpga backends, so introduce undocumented native_local_block_io property which allows to enable mapping to those intrinsics.
36e16ba
to
622b2c7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
we had that recent reversal and fix of striped vs. block which @aelovikov-intel had to deal with. I don't see any problem here, but maybe he should double check
Thank you! No, problems are not expected because PR relies on existing logic regarding this (i.e. striped vs block logic is the same for global and local ptr cases) |
// CHECK-GLOBAL-NEXT: [[TMP0:%.*]] = load i64, ptr [[V:%.*]], align 8, !tbaa [[TBAA15]] | ||
// CHECK-GLOBAL-NEXT: store i64 [[TMP0]], ptr [[AGG_TMP1]], align 8, !tbaa [[TBAA15]] | ||
// CHECK-GLOBAL-NEXT: call void @llvm.memcpy.p0.p4.i64(ptr align 8 [[AGG_TMP2]], ptr addrspace(4) align 8 [[ITER:%.*]], i64 80, i1 false), !tbaa.struct [[TBAA_STRUCT92:![0-9]+]] | ||
// CHECK-GLOBAL-NEXT: tail call spir_func void @_ZN4sycl3_V13ext6oneapi12experimental11group_storeINS0_9sub_groupEiLm2ENS0_6detail17accessor_iteratorIiLi1EEENS3_10propertiesINS3_6detail20properties_type_listIJNS3_14property_valueINS3_18data_placement_keyEJSt17integral_constantIiLi1EEEEENSC_INS3_21contiguous_memory_keyEJEEENSC_INS3_14full_group_keyEJEEENSC_INSA_25native_local_block_io_keyEJEEEEEEEEEENSt9enable_ifIXaaaasr6detailE18verify_store_typesIT0_T2_Esr6detailE18is_generic_group_vIT_E18is_property_list_vIT3_EEvE4typeESS_NS0_4spanISQ_XT1_EEESR_ST_(ptr noundef nonnull byval(%"struct.sycl::_V1::sub_group") align 1 [[AGG_TMP]], ptr noundef nonnull byval(%"class.sycl::_V1::span.22") align 8 [[AGG_TMP1]], ptr noundef nonnull byval(%"class.sycl::_V1::detail::accessor_iterator") align 8 [[AGG_TMP2]], ptr noundef nonnull byval(%"class.sycl::_V1::ext::oneapi::experimental::properties.28") align 1 [[AGG_TMP3]]) #[[ATTR7]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks wrong for the test_accessor_iter_force_optimized
.
Update: my mistake, ignore.
Extension: https://registry.khronos.org/OpenCL/extensions/intel/cl_intel_subgroup_local_block_io.html
Currently these built-ins for local address space are not supported by cpu/fpga backends, so introduce undocumented
native_local_block_io
property which allows to enable mapping to those built-ins. If this property is not provided then implementation falls back to naive approach.