Underspecified behavior of accessor::get_pointer and accessor::get_multi_ptr #580

Pennycook · 2024-07-03T15:22:13Z

The specification says:

This function may only be called from within a command.

...but it doesn't say what should happen if either function is called from outside of a command. An implementation is free to return any pointer (including a NULL pointer) which may result in bugs that are difficult to track down or understand. I've seen code like the below hang or segfault across different devices:

q.submit([&](sycl::handler &h) {

  // This is an (undiagnosed) error, because get_multi_ptr is called on the host
  ptr = buf.get_access<sycl::access::mode::read_write, sycl::access::target::device>(h)
           .get_multi_ptr<sycl::access::decorated::no>();

  h.parallel_for(..., [=](sycl::nd_item<1> it) {
    foo(ptr); // The pointer passed to the device code is undefined
  });
});

Is there a reason that we cannot define get_multi_ptr() to throw an error when called on the host?

If this is intended to be undefined behavior, I think the specification should state that explicitly. But an error would be more useful to developers.

The text was updated successfully, but these errors were encountered:

gmlueck · 2024-07-03T16:38:22Z

I would like DPC++ to diagnose this as a compile-time error. Therefore, I don't want the spec to mandate an exception because that would prevent me from treating the code as ill-formed.

Pennycook · 2024-07-03T16:54:56Z

I would like DPC++ to diagnose this as a compile-time error. Therefore, I don't want the spec to mandate an exception because that would prevent me from treating the code as ill-formed.

Ah, okay.

Would it make sense to consider introducing some new wording that gives implementations that level of flexibility, but not as much flexibility as undefined behavior? Something that basically means "an implementation must diagnose this as an error, either at compile-time or run-time"? I think that would still be testable in the CTS, because a conforming implementation would have to either fail to compile the test or terminate (somehow) at run-time.

nliber · 2024-07-03T17:29:16Z

Do you mean something like the new Erroneous Behavior (well defined but a bug) category in C++26? I would support that direction.

Pennycook · 2024-07-03T17:50:37Z

Do you mean something like the new Erroneous Behavior (well defined but a bug) category in C++26? I would support that direction.

Something like it, yeah. I wasn't sure if we could re-use "erroneous behavior" directly, because I only really know about it in the context of uninitialized variables. But the idea would be basically the same, if I understand it correctly:

The value returned by get_pointer() when called on the host would be some implementation-defined value.
Using the value returned by get_pointer() on the host would be ~"erroneous behavior".
Implementations would be encouraged to issue a (run-time) diagnostic if ~"erroneous behavior" is encountered.
Implementations would be allowed to issue a (compile-time) diagnostic if they can determine ~"erroneous behavior" will be encountered.

It would probably be safest to pick a new name, and only adopt "erroneous behavior" if we're fully aligned with the final C++26 wording.

gmlueck · 2024-07-03T21:02:04Z

I don't fully understand the concept of "erroneous behavior". If we specified that calling get_pointer on the host is erroneous behavior, would an implementation be allowed to fail compilation with an error if it statically detected such a call?

Pennycook · 2024-07-08T14:21:59Z

I don't fully understand the concept of "erroneous behavior". If we specified that calling get_pointer on the host is erroneous behavior, would an implementation be allowed to fail compilation with an error if it statically detected such a call?

That's my reading of the paper, at least:

Both runtime sanitizers and static analysis can use the code readability signal from both undefined and erroneous behaviour equally well. In both cases it is clear that the code is incorrect. For undefined behaviour, implementations are unconstrained anyway and tools may reject or diagnose at runtime. The goal of erroneous behaviour is to permit the exact same treatment, by allowing a conforming implementation to diagnose, terminate (and also reject) a program that contains erroneous behaviour.

But it's important to note that an implementation isn't required to diagnose erroneous behavior at compile-time. Thinking about things a bit more over the weekend, I realized that we probably can't throw an error every time we see get_pointer() in host code, because of cases like this:

void foo(auto accessor) {
  auto ptr = accessor.get_pointer();
}

I don't think that we can prove conclusively whether or not foo will ever be called from host code. For example, it might be called via a function pointer, and there are no restrictions on using those in host code. But we can't just throw an error because this looks like host code, because the user might intend that foo is only ever called on the device. So, it seems that the best an implementation like DPC++ could do is to try and detect such errors at compile-time, but always diagnose the error at run-time as a fallback.

To be fully clear, I think the options open to implementations would be something like the following:

Always return a NULL pointer from get_pointer(). (NB: This is a stronger guarantee than UB.)
Always return a specific "poison" value from get_pointer().
Print a warning if the pointer resulting from get_pointer() is used in any way.
Terminate the program if the pointer resulting from get_pointer() is used in any way.
Reject the program at compile-time if the pointer resulting from get_pointer() is always used.

The paper mentions that it would be legal to add a flag (similar to -ffast-math) to assert that a program contains no such erroneous behavior, so developers could opt-out of these safety checks for performance.

illuhad · 2024-07-09T23:29:50Z

Is there a reason that we cannot define get_multi_ptr() to throw an error when called on the host?

It's probably impossible to detect whether you are on the host or inside device code e.g. in a library-only host implementation. Anything that relies on such a branching (including returning e.g. nullptr) might not work in such a case. So I vote in favor of UB. Other implementation types can of course add whatever diagnostic they like.

Pennycook · 2024-07-10T14:01:22Z

It's probably impossible to detect whether you are on the host or inside device code e.g. in a library-only host implementation. Anything that relies on such a branching (including returning e.g. nullptr) might not work in such a case. So I vote in favor of UB. Other implementation types can of course add whatever diagnostic they like.

Can you say a bit more about the sort of library-only implementation that you're concerned about?

It seems to me like even a single-threaded library-only host implementation could distinguish between host code and device code (e.g., by introducing some global state). A SYCL implementation only transitions from executing host code to executing device code at certain points (which are under the implementation's control), so what is stopping an implementation from tracking that?

illuhad · 2024-07-10T19:06:59Z

It seems to me like even a single-threaded library-only host implementation could distinguish between host code and device code (e.g., by introducing some global state). A SYCL implementation only transitions from executing host code to executing device code at certain points (which are under the implementation's control), so what is stopping an implementation from tracking that?

That's fair - I was thinking in terms of determining statically whether we are inside host/device code. Everything else, like runtime queries with a global object, will be prohibitively costly for performance for something like get_pointer() which might be used frequently in hot parts of the kernel for memory access.

Pennycook · 2024-07-10T20:12:23Z

That's fair - I was thinking in terms of determining statically whether we are inside host/device code. Everything else, like runtime queries with a global object, will be prohibitively costly for performance for something like get_pointer() which might be used frequently in hot parts of the kernel for memory access.

Oh, I definitely agree that this could have significant cost.

But "erroneous behavior" allows implementations to provide flags that disable the checks. So, an implementation where the checks would be prohibitively expensive could have them turned on by default (for safety/compliance reasons) but recommend that users run with some standard options to improve performance. For a library-only implementation that would probably be a macro, whereas for a compiler-based implementation it could be a real flag.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Underspecified behavior of accessor::get_pointer and accessor::get_multi_ptr #580

Underspecified behavior of accessor::get_pointer and accessor::get_multi_ptr #580

Pennycook commented Jul 3, 2024 •

edited

Loading

gmlueck commented Jul 3, 2024

Pennycook commented Jul 3, 2024

nliber commented Jul 3, 2024 •

edited

Loading

Pennycook commented Jul 3, 2024

gmlueck commented Jul 3, 2024

Pennycook commented Jul 8, 2024

illuhad commented Jul 9, 2024

Pennycook commented Jul 10, 2024

illuhad commented Jul 10, 2024

Pennycook commented Jul 10, 2024

Underspecified behavior of accessor::get_pointer and accessor::get_multi_ptr #580

Underspecified behavior of accessor::get_pointer and accessor::get_multi_ptr #580

Comments

Pennycook commented Jul 3, 2024 • edited Loading

gmlueck commented Jul 3, 2024

Pennycook commented Jul 3, 2024

nliber commented Jul 3, 2024 • edited Loading

Pennycook commented Jul 3, 2024

gmlueck commented Jul 3, 2024

Pennycook commented Jul 8, 2024

illuhad commented Jul 9, 2024

Pennycook commented Jul 10, 2024

illuhad commented Jul 10, 2024

Pennycook commented Jul 10, 2024

Pennycook commented Jul 3, 2024 •

edited

Loading

nliber commented Jul 3, 2024 •

edited

Loading