Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-POD arguments passed incorrectly on ARM64 #106471

Closed
azeno opened this issue Aug 15, 2024 · 18 comments
Closed

Non-POD arguments passed incorrectly on ARM64 #106471

azeno opened this issue Aug 15, 2024 · 18 comments
Labels
area-Interop-coreclr documentation Documentation bug or enhancement, does not impact product or test code
Milestone

Comments

@azeno
Copy link

azeno commented Aug 15, 2024

Description

While trying to run ImGui.NET on win-arm64 I ran into some sort of argument passing issue from managed to native code when calling https://github.com/cimgui/cimgui/blob/35a4e8f8932c6395156ffacee288b9c30e50cb63/cimgui.cpp#L207 via the managed wrapper https://github.com/ImGuiNET/ImGui.NET/blob/70a87022f775025b90dbe2194e44983c79de0911/src/ImGui.NET/Generated/ImGui.gen.cs#L21374

I managed to reproduce it with following stripped down version. Note that this code runs fine in win-x64. It also runs fine in win-arm64 when removing the default constructor from the Vector2 struct. This makes me wonder whether it's even a dotnet issue, therefor feel free to point me to the appropriate channels if it isn't.

Reproduction Steps

C#

using System.Runtime.InteropServices;

var result = NativeMethods.MyFunction(default, default);
Console.WriteLine($"{nameof(NativeMethods.MyFunction)} returned {result}");

struct Vector2 { float x, y; }

static class NativeMethods
{
    [DllImport("PInvoke.dll", CallingConvention = CallingConvention.Cdecl)]
    public static extern int MyFunction(Vector2 vector, int value);
}

C++

struct Vector2
{
    float x, y;
    // Uncomment the following line and MyFunction returns the correct result
    Vector2() : x(0.0f), y(0.0f) { }
};

extern "C" __declspec(dllexport) int MyFunction(Vector2 vector, int value)
{
    return value;
}

Attached is a little VS solution containing the above code
PInvoke.zip

Expected behavior

MyFunction should receive 0 as value.

Actual behavior

MyFunction receives some random number as value.

Regression?

No response

Known Workarounds

No response

Configuration

.net8, Windows 11, ARM64

Other information

I've also tried building the project with clang but got the same result.

@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Aug 15, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Aug 15, 2024
@huoyaoyuan huoyaoyuan added area-Interop-coreclr and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Aug 15, 2024
@huoyaoyuan
Copy link
Member

It's caused by ABI difference on ARM64. You can see more details at https://learn.microsoft.com/cpp/build/arm64-windows-abi-conventions

In short, the default constructor in C++ code make it no longer considered HFA on ARM64:

A type is considered to be an HFA or HVA if all of the following hold:

  • It's non-empty,
  • It doesn't have any non-trivial default or copy constructors, destructors, or assignment operators,
  • All of its members have the same HFA or HVA type, or are float, double, or neon types that match the other members' HFA or HVA types.

The argument passing schema for HFA and regular structs are different on ARM64. On x64 there's no such concept and all structs are passed in the same way.

I'm not sure whether .NET supports passing HFAs as a single argument at all.

@azeno
Copy link
Author

azeno commented Aug 16, 2024

Thank you for the explanation. As a workaround it was possible in my case to change the default constructor to Vector2() = default;.

azeno added a commit to vvvv/imgui that referenced this issue Aug 16, 2024
azeno added a commit to vvvv/cimgui that referenced this issue Aug 16, 2024
azeno added a commit to vvvv/ImGui.NET-nativebuild that referenced this issue Aug 16, 2024
azeno added a commit to vvvv/VL.StandardLibs that referenced this issue Aug 16, 2024
This commit also rebuilds the x64 binaries to ensure both use the same source code after applying workaorund from dotnet/runtime#106471
azeno added a commit to vvvv/VL.StandardLibs that referenced this issue Aug 16, 2024
This commit also rebuilds the x64 binaries to ensure both use the same source code after applying workaorund from dotnet/runtime#106471
@jkotas
Copy link
Member

jkotas commented Aug 17, 2024

.NET interop targets C ABI. The structs used for interop must be C structs on the unmanaged side in a portable code. They cannot have C++ constructors, destructors or virtual methods.

The calling convention details for non-POD (Plain Old Data) C++ types are often different from the calling convention details of plain C structs as you have discovered.

This came up a few times before (e.g. in #12312). We should mention this in the docs.

@jkotas jkotas added the documentation Documentation bug or enhancement, does not impact product or test code label Aug 17, 2024
@jkotas jkotas changed the title Argument not passed correctly on ARM64 Non-POD arguments passed incorrectly on ARM64 Aug 17, 2024
@AaronRobinsonMSFT AaronRobinsonMSFT added this to the 10.0.0 milestone Sep 29, 2024
@AaronRobinsonMSFT AaronRobinsonMSFT removed the untriaged New issue has not been triaged by the area owner label Sep 29, 2024
@josetr
Copy link

josetr commented Nov 10, 2024

.NET interop targets C ABI. The structs used for interop must be C structs on the unmanaged side in a portable code. They cannot have C++ constructors, destructors or virtual methods.

The calling convention details for non-POD (Plain Old Data) C++ types are often different from the calling convention details of plain C structs as you have discovered.

This came up a few times before (e.g. in #12312). We should mention this in the docs.

https://github.com/josetr/pinvoke-arm64-issue

Can you take a look at it?

It does seem to work even with constructors / destructors / virtual methods as long as the structure is bigger than 16 bytes

Use case: we were trying to add arm support for https://github.com/mono/CppSharp

@jkotas
Copy link
Member

jkotas commented Nov 10, 2024

It does seem to work even with constructors / destructors / virtual methods as long as the structure is bigger than 16 bytes

This is not unexpected. The exact set of cases where non-POD C++ type happens to be passed the same way as POD C++ types varies between architectures.

@jkotas
Copy link
Member

jkotas commented Nov 10, 2024

This came up a few times before (e.g. in #12312). We should mention this in the docs.

We have interop with C++ mentioned in the docs now: https://github.com/dotnet/docs/blob/main/docs/standard/native-interop/abi-support.md#c-1 . cc @AaronRobinsonMSFT

EDIT Official doc - https://learn.microsoft.com/dotnet/standard/native-interop/abi-support#c-1

@jkotas jkotas closed this as completed Nov 10, 2024
@tritao
Copy link
Contributor

tritao commented Nov 10, 2024

This came up a few times before (e.g. in #12312). We should mention this in the docs.

We have interop with C++ mentioned in the docs now: https://github.com/dotnet/docs/blob/main/docs/standard/native-interop/abi-support.md#c-1 . cc @AaronRobinsonMSFT

It's been so far (even since .NET Framework days) possible to achieve actual C++ interop by generating C# P/Invoke code following C++ ABI semantics, "lowering" it to C-compatible P/Invoke signatures. This is the approach we use in CppSharp, through which we don't need to generate generally unnecessary C wrappers.

ARM64 is the first platorm where this is not actually possible. I think this is easier to explain with a simple example:

struct NonPod16 { ... };
NonPod16 CreateNonPod(intptr_t b)

This ends up as (in LLVM IR format for easier understanding):

define void @_Z12CreateNonPodl(ptr noalias sret(%struct.NonPod16) align 8 %0, i64 noundef %1) #1 {

So the return-by-value gets transformed into what LLVM calls sret, as struct return. We get that information from LLVM/Clang directly and generate our P/Invoke as:

static unsafe extern void CreateNonPod(ref S16 ret, long b);

This has worked great for a long time. Now, ARM64 is what is causing issues for us. failing with the case above.
This is what LLVM has to say:

// In AAPCS, an SRet is passed in X8, not X0 like a normal pointer parameter.
// However, on windows, in some circumstances, the SRet is passed in X0 or X1
// instead.

Which makes sense, it means we cannot rely on this trick, struct returns follow a separate calling convention, which is not the case on other platforms, and .NET itself has recognize this is a special return parameter to generate valid ABI code.

If we move the return back as a proper return in P/Invoke, things start working:

static unsafe extern S16 CreateNonPod(long b);

[StructLayout(LayoutKind.Explicit, Size = 16)]
public partial struct S16 { ... }

The only issue is that there is yet another special case where there is a distinction in the ABI depending if the struct is bigger than 16 bytes or not to decide if its passed in registers or in the stack.

The .NET runtime already implements the proper ABI behaviour for both cases, we just have no way to signal to P/Invoke which case it should use. We don't want or need the .NET runtime to be able to figure out from the struct layout which case it needs to use, as that would mean encoding C++ non-pod logic in the runtime. We already know what we need .NET to do, we just need a boolean attribute so we can tell P/Invoke what it needs to do.

Currently we can force .NET to do the right thing with:

[StructLayout(LayoutKind.Explicit, Size = 17)]

But this has other potential implications and is not really correct.

What we need is for P/Invoke to recognize maybe a [StructLayout(LayoutKind.Explicit, Size = 16, PassMode = PassMode.Direct/Indirect)].

Or maybe:

static unsafe extern void CreateNonPod([SRet] ref S16 ret, long b);

Could something like this be considered to allow direct ARM64 C++ interop @jkotas @AaronRobinsonMSFT ?

@jkotas
Copy link
Member

jkotas commented Nov 10, 2024

This has worked great for a long time. Now, ARM64 is what is causing issues for us.

I think you are lucky that you have not run into C++ calling convention issues on other architectures. For example, I would expect that C++ interop on win x86 would run into #100033 that the runtime historically must have special handling for to make managed C++ work.

Could something like this be considered to allow direct ARM64 C++ interop @jkotas @AaronRobinsonMSFT ?

I do not think we would want to introduce one-off workarounds like this. If we were to do something for C++, we would want to look into what it would take to support C++ calling conventions holistically in the runtime.

@tritao
Copy link
Contributor

tritao commented Nov 10, 2024

This has worked great for a long time. Now, ARM64 is what is causing issues for us.

I think you are lucky that you have not run into C++ calling convention issues on other architectures. For example, I would expect that C++ interop on win x86 would run into #100033 that the runtime historically must have special handling for to make managed C++ work.

I don't know the full details of that particular one, but we do invoke copy constructors ourselves to handle such class of issues.

Could something like this be considered to allow direct ARM64 C++ interop @jkotas @AaronRobinsonMSFT ?

I do not think we would want to introduce one-off workarounds like this. If we were to do something for C++, we would want to look into what it would take to support C++ calling conventions holistically in the runtime.

That really complicates the runtime though, it means basically keeping a significant part of the backend portion of a C++ compiler to replicate all of the ABI logic. Clang already implements all that logic, we can just take advantage of it as CppSharp does. What would be the benefits of encoding C++ ABI interop into the runtime when the same can be achieved without introducing additional complicated machinery with source generators (like COM wrapper generator)?

Really all we need to achieve this is a relatively simple way to signal P/Invoke what code path to take.

@jkotas
Copy link
Member

jkotas commented Nov 10, 2024

I don't know the full details of that particular one, but we do invoke copy constructors ourselves to handle such class of issues.

I do not think you can invoke copy constructors in this case with full fidelity.

it means basically keeping a significant part of the backend portion of a C++ compiler to replicate all of the ABI logic

It depends on the design.

@AaronRobinsonMSFT
Copy link
Member

AaronRobinsonMSFT commented Nov 10, 2024

Really all we need to achieve this is a relatively simple way to signal P/Invoke what code path to take.

More of a nit and getting a bit pedantic about implementation details, but this is misleading. The interop system emits IL and then passes it to the JIT. The statement above should be to inform the JIT on how to pass the argument.

This approach is possible, but would then create a likely fragile system as the JIT team would need to consider all interactions in the places where this information needs to be respected, potentially impacting correct use of the target ABI . It would also bring up other questions, is it respected on all CPU/OS combinations? When do users know when to use it or not? This would make for a very complex feature to use and document expectations.

it means basically keeping a significant part of the backend portion of a C++ compiler to replicate all of the ABI logic

This is conceptually simpler than one would think. It would mean the JIT can know it is mode X or Y, not one off cases that require more consideration due to potential conflicts with the current target C ABI.

[StructLayout(LayoutKind.Explicit, Size = 16, PassMode = PassMode.Register/Stack)].

My guess here is this would actually need to be even more specific and require telling the JIT precisely which register to use.

@josetr
Copy link

josetr commented Nov 10, 2024

@AaronRobinsonMSFT @jkotas

https://godbolt.org/z/rE3Mqfnrj

I think this example shows pretty clearly our problem

.NET, because the struct is small enough and only cares about C ABI, still thinks it can call the function in the optimized "direct" mode way, where the struct is returned / stored directly into the x0 / x1 registers

However, this doesn't always apply in C++. Such direct mode optimization is ignored when the returned struct has a destructor, copy constructor, vtable, etc. It will instead fallback to indirect mode, which is what is always used when the struct is bigger than 16 bytes.

We just need a way to tell .NET to force indirect mode, because we know better, and because we are using LLVM to get such accurate info.

I think it's fair to ask for an option that allows us to achieve this, even if its behind EditorBrowsableState.Never.

So we need something like

  1. if (size > 16 || forceIndirectMode)

or some c++ support

  1. if (size > 16 || structLayout.cpp_flags.has_destructor || structLayout.cpp_flags.has_copyctor || structLayout.cpp_flags.has_vtable) (this is the cases we've found so far, dunno if there are more)

@tannergooding
Copy link
Member

Such direct mode optimization is ignored when the returned struct has a destructor, copy constructor, vtable, etc. It will instead fallback to indirect mode, which is what is always used when the struct is bigger than 16 bytes.

This isn't strictly true. It depends on the C++ ABI, which differs for Windows vs Unix and sometimes per architecture as well (x64 vs Arm64 vs ...).

Unix typically follows https://itanium-cxx-abi.github.io/cxx-abi for example.

Likewise, any struct bigger than 16 bytes being passed by reference isn't true either. It is rather dependent on the field types, sizes, and multiple other factors. A struct containing 4 float fields is classified as an HFA (homogenous floating-point aggregate) and is allowed to be passed in register still. Similar exists for HVA (homogenous vector aggregates).

Interop with C++ is also typically not recommended because the C++ ABI is "unstable". It changes with new language features, with subtle changes to your type definitions, based on "private implementation details", and many other factors. It's an incredibly complex area and best practice (for interop with any language) is to use a C wrapper so you can have a definitively stable ABI instead.

@josetr
Copy link

josetr commented Nov 11, 2024

Such direct mode optimization is ignored when the returned struct has a destructor, copy constructor, vtable, etc. It will instead fallback to indirect mode, which is what is always used when the struct is bigger than 16 bytes.

This isn't strictly true. It depends on the C++ ABI, which differs for Windows vs Unix and sometimes per architecture as well (x64 vs Arm64 vs ...).

I was only talking about my godbolt example, which targets Clang ARM64. We already handle the differences you've mentioned by using LLVM. It correctly told us what we needed to do for Mac ARM64, but we have no way of translating that accurate information to .NET so that it can do the right thing.

Adding something that is similar to forceIndirectMode is good enough for me tho, I'll take anything.

We also already know about the C wrapper solution, but that obviously has a performance cost, and a build step.

@tannergooding
Copy link
Member

If you're already willing to hardcode the fact that it needs to be passed by reference, you can just encode that in your P/Invoke signature (i.e. ref T value or T* value).

Anything more accurate will require taking in and supporting defining all the various relevant C++ ABI metadata pieces, which is likely a non-starter.

@tritao
Copy link
Contributor

tritao commented Nov 11, 2024

Such direct mode optimization is ignored when the returned struct has a destructor, copy constructor, vtable, etc. It will instead fallback to indirect mode, which is what is always used when the struct is bigger than 16 bytes.

This isn't strictly true. It depends on the C++ ABI, which differs for Windows vs Unix and sometimes per architecture as well (x64 vs Arm64 vs ...).

Unix typically follows https://itanium-cxx-abi.github.io/cxx-abi for example.

Likewise, any struct bigger than 16 bytes being passed by reference isn't true either. It is rather dependent on the field types, sizes, and multiple other factors. A struct containing 4 float fields is classified as an HFA (homogenous floating-point aggregate) and is allowed to be passed in register still. Similar exists for HVA (homogenous vector aggregates).

We're quite aware of this but it's all irrelevant since Clang already handles all of those details for us.

Interop with C++ is also typically not recommended because the C++ ABI is "unstable". It changes with new language features, with subtle changes to your type definitions, based on "private implementation details", and many other factors. It's an incredibly complex area and best practice (for interop with any language) is to use a C wrapper so you can have a definitively stable ABI instead.

This is commonly said, but we have proved with CppSharp that the C++ ABI is actually very stable. We have had no single instance of the C++ ABI changing in over a decade. In fact, the only change was in the standard library, std::string ABI changed with C++11 if you want to include that in the definition of the C++ ABI.

And even if it changed, it's a non issue since all the bindings are generated and we get the accurate info for the ABI details (including vtable layouts) from LLVM and Clang. A simple re-generation and we're back to business.

If you're already willing to hardcode the fact that it needs to be passed by reference, you can just encode that in your P/Invoke signature (i.e. ref T value or T* value).

Anything more accurate will require taking in and supporting defining all the various relevant C++ ABI metadata pieces, which is likely a non-starter.

This is what we use in all non-ARM64 platforms, but as I explained in my post above, this does not work for ARM64 due to a special ABI for indirect struct returns (X0 vs X8 register).

As @josetr said, what we need is in essence a single line change here:

We need to be able to signal to P/Invoke that take the indirect code path. Right now .NET has a very good direct C++ interop story with CppSharp, and is used a lot by the game dev community. All we request is for some consideration to be made here to the needs of C++ interop in P/Invoke to keep this good support going for ARM64.

@tannergooding
Copy link
Member

This is commonly said, but we have proved with CppSharp that the C++ ABI is actually very stable

This needs to be clarified to note that it has shown as stable in your particular scenario.

The broader ecosystem for the language as a whole is known to have issues and where those come into play. In the case of the STL, all three major C++ runtime implementations have a set of known pending ABI breaks that they need to take at some point in the future. In some cases these are necessary to support newer language versions or ensure they are standards compliant. All three major implementations do try and keep breaks down and keep things semi stable, any major C++ library would ideally be doing the same, but that is a distinct consideration.

what we need is in essence a single line change here:

This isn't a single line change and isn't as simple as its portrayed. Fixing this one particular instance could have some smallish (but still a couple hundred lines total across the VM, JIT, and BCL) to ensure it worked. However, that also has risk of opening the floodgates to other similar changes/requests. In the most correct case, it involves adding a much broader set of attributes/annotations that allow users to specify what C++ concepts exist in a type including (but not limited to) inheritance, virtual members, constructors, destructors, defaulted vs non-defaulted members, and a few other things that are known to impact the ABI depending on whether or not they're defined.

There's an entire range of possible solutions, but none of them are really "trivial" and leave open questions as to the broader needs, designs, concerns, and ecosystem impact.

@tannergooding
Copy link
Member

Notably @AaronRobinsonMSFT and @jkoritzinsky are likely the right ones to give official consideration as to right direction (if any) here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-Interop-coreclr documentation Documentation bug or enhancement, does not impact product or test code
Projects
Status: No status
Development

No branches or pull requests

7 participants