-
-
Notifications
You must be signed in to change notification settings - Fork 31.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-130090: Support PGO for clang-cl #129907
Conversation
for _freeze_module in case of clang-cl to speed up the build
Speeds up both MSVC and clang-cl builds. Should most probably done in a separate PR and issue, though.
Thanks! Just so I understand correctly: this PR does not break the existing MSBuild backend right? I think ideally, we'd want to allow users to switch between the two in the |
Correct. I had no problems doing the regular builds MSVC builds. Tried debug / release / PGO during my expirements. |
Done, still green, see below :)
This is already possible and the regular builds still work like they did before, e.g.
To do a clang-cl build, I use (similar to what @mdboom does like you mentioned in faster-cpython/ideas#690 (comment)):
to switch from the default PlatformToolset (i.e. MSVC v143, etc) to clang-cl. I personally prefer to set PreferredToolArchitecture, because
|
Oh, the build fleet started and is green, too. So this should confirm my local tests above? OOC, why did that happen? I thought they won't run for draft PRs? |
PCbuild/pythoncore.vcxproj
Outdated
@@ -420,6 +420,7 @@ | |||
<ClCompile Include="..\Modules\blake2module.c"> | |||
<PreprocessorDefinitions Condition="'$(Platform)' == 'x64'">HACL_CAN_COMPILE_SIMD128;%(PreprocessorDefinitions)</PreprocessorDefinitions> | |||
<PreprocessorDefinitions Condition="'$(Platform)' == 'x64'">HACL_CAN_COMPILE_SIMD256;%(PreprocessorDefinitions)</PreprocessorDefinitions> | |||
<AdditionalOptions>/arch:AVX</AdditionalOptions> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be removed right? It's not always true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My clang-cl builds error without that and it does not hurt MSVC AFACT.
It is also used for Hacl_Hash_Blake2b_Simd256.c
and Hacl_Hash_Blake2s_Simd128.c
, but there with Condition="'$(Platform)' == 'x64'
.
I think, I should add that condition here, too.
If in doubt about MSVC, I could also furthermore add Condition="$(PlatformToolset) == 'ClangCL'
.
PCbuild/pythoncore.vcxproj
Outdated
@@ -716,6 +717,18 @@ | |||
<Delete Files="$(IntDir)pyconfig.h;$(OutDir)pyconfig.h" /> | |||
</Target> | |||
|
|||
<Target Name="EnsureClangProfileDir" BeforeTargets="PrepareForBuild" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, we should not only ensure the profile dir, but empty it.
But most probably only when a full rebuild / clean is done.
@chris-eibl the Windows build expert for CPython has given their approval provided we don't affect any of the standard |
I haven't looked through the details of how it actually works, but I approve of the approach to making changes. (Most contributors to the build files make things actively worse in some ways, but it looks like you've done a great job and avoided it!) |
@zooba Maybe you can help me where and how to best place the cleaning of the clang profile dir? Regarding building the _freeze_module twice in case of PGO builds (commit 26fb51f): This is quite independent from this PR and I think I should create a separate PR (and issue), since the MSVC build will benefit from it, too? |
@chris-eibl do the |
The best way to handle cleaning is to add any generated files into the Avoiding the rebuild of |
I've previously gotten compile errors from clang, because the needed intrinsics were not available without that option. Cannot reproduce anymore. Most probably, because I've upgraded to Visual Studio 17.13.0 Preview 5.0, which now ships with clang 19.1.1 instead of 18.1.8 and they've done that for compatibility with MSVC? Anyway, let's keep the PR small :)
This reverts commit 26fb51f. Shall be done in a separate PR.
This better matches the behaviour of build.bat in case of MSVC PGO builds.
Since
I now always empty the profile dir. Seems anyway safer to me, so older data cannot kick in (although the names of those files have not yet changed for me - but better safe than sorry). |
Ah yeah. This is the same kind of cleanup as that, so you'll want to do it the same way. The MSBuild clean would delete critical files for the second stage. |
So this seems ready to me to pull out of draft. In case there is more todo before converting IMHO, #130040 should be merged first, so we have CI coverage that the non-PGO clang-cl build is still fine after this PR. PS: sorry @Fidget-Spinner, didn't want to already ask for a review. Happened accidentially when trying to find the "button" where I could convert from draft. |
Ah - found the button to convert from draft. That automatically asked for a review from python/windowsteam. Just cannot convert back to draft now. But I think it is okay anyway. Will go to merge with main, etc. Sorry for the churn, I am not yet fully acquainted with the procedure here - still learning ... |
Thanks! Can you open an issue and link it to this PR by adding it to the title? Something like gh-XXX: <Title> where XXX is the issue number. We usually need issues for nontrivial changes in case we need to deliberate further. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More proposed changes than I expected, but it's mostly organisation :) I'm impressed it's actually this simple.
@zooba thank you so much for your suggestions. |
Works for me either way, just let's not break the build. |
I have created a separate issue #130419 and PR #130420 (clang-cl needs almost a minute on my dusty PC, MSVC is indeed much faster). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some small comments for the instructions.
Co-authored-by: Ken Jin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of readme suggestions, but I think we're nearly there!
PCbuild/readme.txt
Outdated
All other build.bat options continue to work as with MSVC, so this | ||
will create a 64bit release binary. PreferredToolArchitecture is needed, | ||
because msbuild by default selects the 32bit architecture of the toolset, | ||
which uses -m32 as the default target architecture. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is genuinely required, let's add a <PreferredToolArchitecture Condition="$(PROCESSOR_ARCHITECTURE) == 'AMD64'">x64</PreferredToolArchitecture>
to the clangcl.props. I prefer to not override MSVC defaults for this property, but if Clang can't handle cross-compiling, then provided we know that it's been requested we may as well override defaults.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH, I don't know why I need it when building with msbuild. I do not need it in case of the IDE or when explicitely using a custom installed toolset (most probably because I've always installed 64bit versions of those). I like your suggestion and will try it out - looks promising!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks promising!
But did not work out: most probably, we're "too late" when setting that, meaning that these lines in Microsoft.Cpp.ClangCl.Common.props
are already "resolved":
<_DefaultLLVMInstallDir Condition="'$(_DefaultLLVMInstallDir)' == '' AND '$(PreferredToolArchitecture)' == 'arm64'">$(VsInstallRoot)\VC\Tools\Llvm\ARM64</_DefaultLLVMInstallDir>
<_DefaultLLVMInstallDir Condition="'$(_DefaultLLVMInstallDir)' == '' AND '$(PreferredToolArchitecture)' == 'x64'">$(VsInstallRoot)\VC\Tools\Llvm\x64</_DefaultLLVMInstallDir>
<_DefaultLLVMInstallDir Condition="'$(_DefaultLLVMInstallDir)' == '' AND '$(PreferredToolArchitecture)' != 'x64'">$(VsInstallRoot)\VC\Tools\Llvm</_DefaultLLVMInstallDir>
<LLVMInstallDir Condition="'$(LLVMInstallDir)' == ''">$(_DefaultLLVMInstallDir)</LLVMInstallDir>
Anyway, using the PreferredToolArchitecture
as a way to steer the architecture of the generated binaries was a bad idea. So to get a 32bit binary, one would have to use "/p:PreferredToolArchitecture=x86"
:-O
clang-cl
of course can cross-compile, so let's explicitely set the architecture based on $(Platform).
"/p:CLANG_PROFILE_PATH=<relative-path-to-instrumented-dir-on-remote-host>" | ||
in the PGInstrument step to make sure the profile data is generated | ||
into the instrumented directory when running the PGO task. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs a worked example. Just from this description, I couldn't confidently specify the parameter and trust that I got it right.
Co-authored-by: Steve Dower <[email protected]>
to be independent from the selected architecture or the PlatformToolset
@@ -39,6 +39,8 @@ | |||
<ItemDefinitionGroup> | |||
<ClCompile> | |||
<AdditionalOptions>-Wno-deprecated-non-prototype -Wno-unused-label -Wno-pointer-sign -Wno-incompatible-pointer-types-discards-qualifiers -Wno-unused-function %(AdditionalOptions)</AdditionalOptions> | |||
<AdditionalOptions Condition="'$(Platform)' == 'Win32'">-m32 %(AdditionalOptions)</AdditionalOptions> | |||
<AdditionalOptions Condition="'$(Platform)' == 'x64'">-m64 %(AdditionalOptions)</AdditionalOptions> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an ARM64 option as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm glad you found this option, though! Much neater way to do it (and a bit embarrassing that the built-in targets don't do it...).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an ARM64 option as well?
I think so, yes. I see some hits in the net, and did a clang-cl /?
, which e.g. reveals option /arm64EC
.
But I do not know what to actually write here in case of '$(Platform)'=='ARM64'
- and more importantly how to test it, since I have no arm based device.
I think we should leave that for another PR of someone who does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. @brandtbucher might be that someone. But at the conditions here will just not set an option on that platform and so it should work the same as today.
Co-authored-by: Steve Dower <[email protected]>
Okay, let's take it! It's not going to hold up any releases, so if it needs fixing then people will find it as they're testing. |
Can you share some numbers on the statements:
|
Oh sorry, I don't have hard numbers for that - I've never measured with the stop watch during watching the PGInstrument / gather profile data / PGUpdate steps, i.e. the build step. I just see the "pgo task" Maybe this can be seen in the CI build logs - there is an option to enable timestamps for the log entries. If you are interested in |
msbuild at the end of the PGInstrument and PGUpdate build step does report numbers, though, but I have not noted them for the various builds I've done - I was just too much interested in the |
I just benchmarked this on the Faster CPython team's benchmarking hardware, which has the Visual Studio preview channel, which is currently:
The build with clang+pgo is about 19% faster overall vs. msvc+pgo. Notably, however, some benchmarks clearly have a major regression, so it's not a "win across the board". And of course, I'm only talking about performance and not all of the other myriad of issues that come with changing the compiler. |
Thanks @mdboom for your results! Always glad to see other benchmarks. This fits my results presented on https://gist.github.com/chris-eibl/114a42f22563956fdb5cd0335b28c7ae, too. If I compare the
And like in your results above, Like I've written on dpo: Interestingly, clang 18.1.8 is faster than 19.1.1. Tailcalling always gains speed. Details
|
Like currently discussed in #128718, clang 18.1.8 is faster than 19.1.1, and 20.1.0.rc2 with tailcalling is the fastest:
But even for 20.1.0.rc2 with PGO+tailcalling, Details
|
@chris-eibl You might be up for this as a project - we're incredibly unlikely to switch the default compiler for our releases, due to other compatibility concerns (not least that many users are explicitly using MSVC for builds/extensions, and would be massively inconvenienced, especially since clang looks very much like a moving target right now). But there's some possibility that we might be able to use Clang just for the interpreter loop and continue using MSVC for public interfaces (which goes as deep as the entire CRT and our custom memory allocators, IMHO, unless you can somehow prove that Clang does and will always produce identical behaviour to MSVC). I'm thinking of a single static library project just for the interpreter loop compiled with Clang, which we then link in using MSVC rather than compiling it directly. There are a lot of challenges, but it's going to be the only way (other than persuading MSVC to improve their optimisations, which we're already doing) to get any of these benefits by default any time soon. If you're up for it, I don't know that anyone else is actively working on it right now, so go for it! There might be an existing issue somewhere - I've only had high-level discussions about it - but if you can't find one then start a new one. |
Oh, I think this might be a massive community effort, but yeah, someone somewhere has to start it somehow :)
Yeah, I immediately agreed with you on that part on dpo - and still do. Switching to clang-cl for the official builds is a far too high risk, since the user base is so big - and the compatibilty is too undetermined. If e.g. @zanieb lands astral-sh/python-build-standalone#549, we'd get feedback regarding compatibility issues and a much smaller user base is affected.
This is an interesting approach and might help us with compatibility concerns. I remember another project did something similar to get computed gotos (or the like) into an MSVC build using MingW or clang. AFAIR, they built some *.c (or just one?) files with one compiler and the rest with MSVC. But PGO and LTO are then most certainly problematic.
clang-cl just "shims" the compiler and linker. It What at least the pyperformance tests show: the clang-cl artifacts work with MSVC compiled extensions, since at least some of them come pre-compiled from pypi (e.g.
Most probably, the clang team would consider this as a bug and try to fix it. I can't speak for them (and maybe I am totally wrong here), but IMHO we cannot risk that dependency without having any measure about the compatibily wrt to the Python user base. There is https://clang.llvm.org/docs/MSVCCompatibility.html, all the C++ parts there have no effect on Python. However, just because it compiles and links doesn't mean all extensions out there continue to work without issues. I think we'll have to do baby steps here and support early adopters by fixing issues they run into - as long as this does not produce too much churn in the code base.
Very much appreciated!
Yeah, that discussion needs to happen in a separate issue, not this PR. I did a is:issue state:open clang-cl search but it did not reveal anything suitable. Maybe you and / or @Fidget-Spinner want to be a patron(s) of it, since he fancied to use clang-cl for the official binaries? OTOH, maybe we'd first start on dpo to find out, what early adopters need to get started? Building with clang-cl IMHO is now pretty well supported. Once we have #130040, there is at least one CI schedule that will help us to not break clang-cl builds. |
See #131005 for some numbers. |
Per encouragement from @Fidget-Spinner in faster-cpython/ideas#690 (comment) opened as a draft PR:
Support PGO for clang-cl on Windows using a similar approach as done in the Linux makefiles for clang.
I separate the clang-cl profiles into e.g.
obj\314amd64_PGInstrument\__clang_profiles
, so that they don't clash with different build configurations.Since pythoncore is built first and not in parallel, this was the easiest spot for me.
Fore sure room for improvements, I am definitely far from being an msbuild / vcxproj expert :)
BTW, clang-cl PGO takes far less time to build compared to MSVC, because
Some thoughts:
-flto-thin
to improve build time over execution time. Tried it, doesn't cost much in run time, but speeds up the build a lot._Py_HOT_FUNCTION
, etc. Most probably in follow up PRs (very simple and small)