Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Game packaged as an ipa crashes at startup on iOS #1609

Open
kring opened this issue Feb 7, 2025 · 5 comments
Open

Game packaged as an ipa crashes at startup on iOS #1609

kring opened this issue Feb 7, 2025 · 5 comments
Labels
bug Something isn't working

Comments

@kring
Copy link
Member

kring commented Feb 7, 2025

Mentioned on the community forum here:
https://community.cesium.com/t/solved-ios-apps-distributed-to-the-app-store-crash/29401
https://community.cesium.com/t/issues-with-cesium-in-ue5-4-on-ios/37756/3

Packaging a game and running it on an iPhone from Xcode works fine. But if we create an "archive" (ipa) and then run that on the device, it crashes very quickly after showing the splash screen. The immediate cause of the crash is that a std::string's destructor is called, and the memory it holds was not allocated by the standard operator new. That's because it was allocated from Cesium3DTileset.cpp, which, being compiled in the usual Unreal way, overrides operator new with its own custom allocator.

The fundamental problem is that the basic_string constructor that gets called is found in our Unreal game, so it uses Unreal's custom allocator, while the basic_string destructor is found in libc++.dylib, so it uses the standard allocator.

Rant: this is why overriding operator new and delete is a terrible idea. Think you can get better performance with a custom allocator? Well, ok, I'm skeptical that that's true if it's a general purpose allocator, but go ahead if you want to try. But call your custom allocator explicitly from your code. Overriding the global new and delete makes it pretty much impossible for dynamically-allocated objects to cross .dll/.so/.dylib boundaries. Which maybe doesn't sound too bad, until you consider that "objects" includes std::string, and ".dll/.so/.dylib" includes libc++.so/.dylib. It's pretty much just dumb luck that this works on other platforms where the C++ standard library is dynamically linked.

I spent a lot of time trying to understand why the constructor/destructor end up coming from two different places. Based on stepping through the disassembly with the xcode debugger (with the app running on the iPhone), I can see:

  • Neither the constructor not the destructor are inlined. Both are actual function calls.
  • The constructor implementation is found in our app binary.
  • The destructor implementation is found in libc++.dylib.

Why does this happen? Well, in Apple's libc++ (taken from iOS SDK v18.2), the relevant std::string constructor is declared/defined like this:

  template <__enable_if_t<__is_allocator<_Allocator>::value, int> = 0>
  _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 basic_string(const _CharT* __s) {
     // ...
  }

The destructor looks like this:

  inline _LIBCPP_CONSTEXPR_SINCE_CXX20 ~basic_string() {
    // ...
  }

So the constructor has _LIBCPP_HIDE_FROM_ABI. Among other things, that adds the __exclude_from_explicit_instantiation__ attribute. Here's the clang docs on what that means:
https://clang.llvm.org/docs/AttributeReference.html#exclude-from-explicit-instantiation

Long story short, methods marked with this attribute will be implicitly instantiated in every translation unit (cpp file) where they're used. Never will an implementation from libc++.dylib be used. That's consistent with what I'm seeing. Why? I have no idea!

Meanwhile, the destructor is declared inline. Well, it's not getting inlined. If it were, there wouldn't be a problem, because both the constructor and destructor would then use Unreal's allocator. But no, it's not inlined, despite that fact that it's declared inline. That's a little odd, but not shocking. For one thing, I reckon that inline here doesn't actually do anything at all, because the method implementation is found inside the class definition in this case, which implies inline. And second, inline is doesn't actually mean inline, cause C++ is like that. See here:
https://en.cppreference.com/w/cpp/language/inline

The original intent of the inline keyword was to serve as an indicator to the optimizer that inline substitution of a function is preferred over function call, that is, instead of executing the function call CPU instruction to transfer control to the function body, a copy of the function body is executed without generating the call. This avoids overhead created by the function call (passing the arguments and retrieving the result) but it may result in a larger executable as the code for the function has to be repeated multiple times.

Since inline substitution is unobservable in the standard semantics, compilers are free to use inline substitution for any function that's not marked inline, and are free to generate function calls to any function marked inline. Those optimization choices do not change the rules regarding multiple definitions and shared statics listed above.

So clang isn't necessarily doing anything wrong by not inlining the destructor. 🤷

Alright, so the compiler is doing what it should, and it's easy to see why that causes a crash. The problem is either in Unreal Engine or in libc++, or in any case the interaction between the two of them. What can we do about it? Well, one solution is to tell Unreal not to use a custom allocator. This is done by defining FORCE_ANSI_ALLOCATOR=1. That reportedly worked in older versions of Unreal Engine, but not in newer ones (I haven't tried myself). It's not surprising that there would be problems when compiling the plugin and app with the ANSI allocator and the rest of Unreal Engine with its own allocator. Perhaps compiling all of Unreal Engine with ANSI would work, but that's a drastic step for people that just want to package their game for iOS.

I'm low on other ideas. Perhaps there are some compiler flag tweaks that would cause that destructor to get inlined. But even if we found such a solution, it feels precarious. Or go the other way, and construct all of our strings with a custom STL allocator that is guaranteed not to use Unreal's custom new/delete functions. But the allocator is part of std::string's type, so we'd basically have to specify it everywhere in cesium-native's public API.

Can we statically-link the C++ runtime library on iOS?

What other possibilities am I missing?

@kring kring added the bug Something isn't working label Feb 7, 2025
@kring
Copy link
Member Author

kring commented Feb 7, 2025

Here's another Unreal plugin with the same problem:
getnamo/SocketIOClient-Unreal#402

@kring
Copy link
Member Author

kring commented Feb 7, 2025

@kring
Copy link
Member Author

kring commented Feb 10, 2025

Kicking this around in my head over the weekend, I almost had myself convinced that it's impossible to solve. If there's any possibility of the constructor of std::string (or anything) to be in our binary while the destructor is in libc++, then this allocator mismatch will happen, and there's nothing we can do to reliably fix it.

But then I realized I'm stuck in a Windows mindset. In Windows, functions shared between DLLs must be both explicitly exported and explicitly imported. There's no way that something like libc++ could import a symbol from our binary because the arrow of dependency just doesn't go that way.

But in Linux and macOS and iOS, it works differently. This is going to be an oversimplification, but an intuitive way to think of it is that dynamic linking on these platforms is just static linking that happens at runtime. When libc++ calls operator new, it should get Unreal's overridden version, just like static libraries linked into the same executable would. It does still require the symbol to be exported from the shared object, though, in order for others to be able to link against it.

So I now think that this is the problem. The operator new and operator delete are not being exported, and therefore the std::string destructor in libc++ can't see them, and therefore it uses the default implementation instead of the overridden one.

If I run nm --demangle on the packaged game executable directly built by Xcode (which works!), here are the operators:

00000001065a0870 T operator delete(void*)
00000001065a0838 T operator new(unsigned long)

However, when I run it on the binary in the IPA (which crashes!), those symbols are gone.

I found this post with almost the same problem, that suggests these symbols are being stripped, and that it's possible to exclude them from stripping:
https://developer.apple.com/forums/thread/707242

So I'm going to try that next.

@kring
Copy link
Member Author

kring commented Feb 10, 2025

It worked! If I open up Intermediate/ProjectFiles/dev (IOS).xcodeproj/project.pbxproj and shortly after /* Generate dsym for archive, and strip */ change the shellScript line to comment out the strip command:

#strip -no_code_signature_warning -D \"${CONFIGURATION_BUILD_DIR}/${EXECUTABLE_PATH}\"

Then the generated IPA runs just fine on the iPhone!

It should be possible strip everything except the new and delete operators, rather than remove the symbol stripping entirely, but I haven't tried yet.

This strip command is added by UnrealBuildTool here:
https://github.com/EpicGames/UnrealEngine/blob/5.5.2-release/Engine/Source/Programs/UnrealBuildTool/ProjectFiles/Xcode/XcodeProject.cs#L1523

So I can edit UBT to fix this, but that's super hacky. But at least it's a workaround for people running into this. I guess maybe I'll report this to Epic, and maybe they'll be willing to fix it in a future UE release, because AFAICT any Unreal app on iOS that tries to construct a std::string will crash in a fully store-ready build.

Still thinking about if there's some way to fix this without a change to Unreal Engine and without requiring the end-user to do hacky things.

@kring
Copy link
Member Author

kring commented Feb 10, 2025

New workaround posted here:
https://community.cesium.com/t/solved-ios-apps-distributed-to-the-app-store-crash/29401/6?u=kevin_ring

Use that instead of commenting out the strip line like I did in the post above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant