-
-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Statically link the python
executable to libpython
and disable the shared library
#540
base: main
Are you sure you want to change the base?
Conversation
…e shared library
Does this affect the actual run-time performance of the Python interpreter? Or just the time to start a new process and init the interpreter? I.e. what is the benchmark actually measuring? |
It seems to have a significant holistic effect — this matches some expectations set by the conda-forge folks and @carljm. The referenced number is on the full It also seemed to drastically improve performance on the benchmark in #535. I can tweak the number of calculations such that it's not dominated by interpreter startup, i.e., a runtime per Python process of >5s. I definitely intend to do more benchmarking before marking this as ready for review, I'll post full results then. |
Note to self, should consider updating the following python-build-standalone/cpython-unix/build.py Line 840 in 4615f2f
python-build-standalone/cpython-unix/build.py Line 855 in 4615f2f
python-build-standalone/cpython-unix/build.py Line 510 in 4615f2f
python-build-standalone/cpython-unix/build.py Line 531 in 4615f2f
python-build-standalone/cpython-unix/build.py Line 877 in 4615f2f
Though I think we may want to retain the shared library even if |
I think it is important to understand why static linking is faster. It could be many different things. Some of them might be fixable on shared libraries. As a first step I would compare binaries without PGO, LTO, and BOLT. Static linking unlocks all kinds of optimizations. I suspect what we're seeing is the result of aggressive inlining or something of that nature. Also, I strongly prefer we still ship a libpython.so, even if Python doesn't link it. This gets you the performance without losing the shared library, which some customers will want. |
Agree on all those points. As a note, @geofft has been investigating some other problems that statically linking would solve. I expect he'll engage on exploring this further. |
Are you referring to symbol resolution issues with binary packages pulling in 3rd party libraries [that can overlap with the libraries we statically link]? |
I think it's mostly related to downstream consumption requiring rpath hacks, like |
There is a very old (2002) Debian bug reporting that statically linking libpython is good for performance: https://bugs.debian.org/131813 There too it's about steady-state runtime performance, not startup cost. I think the idea is that there is less back-and-forth between the executable and the library, but it's a good question why this is actually true, given that most of the hot code paths should be fully within the library. Debian does something unusual in that they ship a libpython.so too, and the way they do it is that they build twice, once with If we were to ship a libpython.a then, yes, downstream consumers would have an easier time of things because no rpath is required. (Notably, Note that, as implied by what Debian does, whether we ship a libpython.a and/or a libpython.so is not necessarily correlated with which one our bin/python3 uses. So, we could (at the cost of a longer build time) ship a bin/python3 that statically links libpython but also continue to ship a shared library for people who want it. (I suppose it's also possible that this doesn't actually require two builds, and with sufficient changes to the CPython build system, you can get it to produce both a libpython.a and a libpython.so in the same build.) Fun fact, for the third-party libraries, a handful of downstream consumers would have an easier time if we moved from a static e.g. Tcl/Tk to a shared one. (Notably, PyInstaller outputs a C binary whose splash screen uses Tcl/Tk, so they need the ability to get to those libraries from C, before they've unpacked the Python distribution.) |
I wanted to note that statically linking lib python has yielded proven performance gains for Nuitka as well. |
That makes more intuitive sense to me in that Nuitka compiles what it can, so you're going back and forth between the main program and libpython for the stuff that didn't get compiled. But (One mildly weird idea, btw, is that it's possible for a shared library to have an entry point—try running |
Yeah, this is what I was getting at. Having them as separate libraries helps with symbol resolution issues. It was always on my undocumented backlog to split out at least tcl/tk and the x11 libraries into standalone shared libraries to mitigate this issue. On the static vs dynamic bit, I think the speedup is coming from the compiler/linker no longer having to provide strong ABI guarantees around functions. I think statically linking libpython is enabling it to more aggressively optimize functions without regards to function boundaries. It might be doing some funky copying of functions because I thought that you still needed to export the libpython symbols so loaded extension modules could continue using them. You'd really need to do some low-level debugging - maybe disassembling - to get to the bottom of things. I'd feed the statically linked binary into ghidra and look at the core interpreter loop to see if any funky inlining of libpython symbols is going on. |
Just as an additional fyi, it seems that some corner downstream use of Python do not work as expected when using a statically linked Python, see for example (just a few I encountered in the past):
I recall also a lot of macos segfaults in CMake projects creating extensions as |
Found: pybind/pybind11#3907 . |
Yeah, these linked issues seemingly confirm what I thought: extension module builds really want to run against the Python they were built against. If there is a mismatch between the build and runtime Python, things can blow up. In Conda's world, they have their own universe of binary dependencies. But in PBS / uv world, there isn't as much a buffer here. So my fear is that if PBS ships a static libpython, we're signing ourselves up for all kinds of random extension module breakage. We could assess risk by downloading popular PyPI packages and verifying extension modules load and run. But the "run" part is difficult since there's no guaranteed way to run tests from a wheel. And even if PyPI is fine, you are going to be finding people building extensions behind corporate walls encountering issues. I want to support this work. But I'm worried about side-effects. |
Thanks for sharing those @traversaro! That's helpful context. Just for some context on how I'm thinking about this pull request: I posted this for discussion and testing — I'm not in any rush to land this. |
Just to clarify, all those issues were related to conda-installations, so that was not the problem, as compatible versions of |
Statically linking When This flag is included by default since Python 3.10 if |
We do use |
As part of investigating #535, we posited that Conda's static linking of the
python
executable was part of the performance difference.This change gives a 10% performance improvement (geometric mean on pyperformance).