Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-130090: Support PGO for clang-cl #129907

Merged
merged 24 commits into from
Mar 4, 2025
Merged

gh-130090: Support PGO for clang-cl #129907

merged 24 commits into from
Mar 4, 2025

Conversation

chris-eibl
Copy link
Contributor

@chris-eibl chris-eibl commented Feb 9, 2025

Per encouragement from @Fidget-Spinner in faster-cpython/ideas#690 (comment) opened as a draft PR:

Support PGO for clang-cl on Windows using a similar approach as done in the Linux makefiles for clang.
I separate the clang-cl profiles into e.g. obj\314amd64_PGInstrument\__clang_profiles, so that they don't clash with different build configurations.

Since pythoncore is built first and not in parallel, this was the easiest spot for me.
Fore sure room for improvements, I am definitely far from being an msbuild / vcxproj expert :)

BTW, clang-cl PGO takes far less time to build compared to MSVC, because

  • running the instrumented pgo tests is much faster
  • the PGO step is faster

Some thoughts:

  • Linux uses -flto-thin to improve build time over execution time. Tried it, doesn't cost much in run time, but speeds up the build a lot.
  • There are still many warnings. I've fixed many of them, if interested, we can do them in follow-up PRs. IMHO, some of them are not benign, e.g.
..\PC\launcher2.c(1081,34): warning : result of comparison of constant 191 with expression of type 'char' is always false [-Wtautological-constant-out-of-range-compare] [e:\cpython_clang\PCbuild\pywlauncher.vcxproj]
..\PC\launcher2.c(1081,18): warning : result of comparison of constant 187 with expression of type 'char' is always false [-Wtautological-constant-out-of-range-compare] [e:\cpython_clang\PCbuild\pywlauncher.vcxproj]
  • let clang-cl benefit from _Py_HOT_FUNCTION, etc. Most probably in follow up PRs (very simple and small)

for _freeze_module in case of clang-cl to speed up the build
Speeds up both MSVC and clang-cl builds.

Should most probably done in a separate PR and issue, though.
@Fidget-Spinner
Copy link
Member

Fidget-Spinner commented Feb 9, 2025

Thanks! Just so I understand correctly: this PR does not break the existing MSBuild backend right?

I think ideally, we'd want to allow users to switch between the two in the PCBuild.bat as an optional flag to pass in.

@chris-eibl
Copy link
Contributor Author

Correct. I had no problems doing the regular builds MSVC builds. Tried debug / release / PGO during my expirements.
Currently verifying, just in case I messed up condensing the PGO stuff from my local work ...

@chris-eibl
Copy link
Contributor Author

Currently verifying, just in case I messed up condensing the PGO stuff from my local work ...

Done, still green, see below :)

I think ideally, we'd want to allow users to switch between the two in the PCBuild.bat as an optional flag to pass in.

This is already possible and the regular builds still work like they did before, e.g.

  • build.bat -c Debug -p Win32
  • build.bat -c Release -p x64
  • build.bat --pgo, this one builds even faster now, due to 26fb51f

To do a clang-cl build, I use (similar to what @mdboom does like you mentioned in faster-cpython/ideas#690 (comment)):

build.bat --pgo "/p:PlatformToolset=ClangCL" "/p:PreferredToolArchitecture=x64"

to switch from the default PlatformToolset (i.e. MSVC v143, etc) to clang-cl.

I personally prefer to set PreferredToolArchitecture, because

  • then I get the 64bit native version of the compilers
  • AFAIR I needed that in case of clang-cl, because it otherwise defaulted to its 32bit version, that uses -m 32 by default.
    Except when I explicitely set LLVMInstallDir and LLVMToolsVersion to use e.g. clang-cl of llvm-19.1.6-x86_64 to try your tailcall improvements, because Visual Studio 2022 bundles llvm-181.8.
  • and due to a very small personal nit: I do not like _freeze_module.exe being placed into PCbuild\win32 and its object files into PCbuild\obj\314win32_Release instead of PCbuild\amd64 and PCbuild\obj\314amd64_Release (supposedly a side effect of using <Platform>$(PreferredToolArchitecture)</Platform> for <FreezeProjects> since https://github.com/python/cpython/pull/28491/files).

@chris-eibl
Copy link
Contributor Author

Thanks! Just so I understand correctly: this PR does not break the existing MSBuild backend right?

Oh, the build fleet started and is green, too. So this should confirm my local tests above?

OOC, why did that happen? I thought they won't run for draft PRs?
Neither did I see any core dev triggering them ...
... so most probably my assumption about draft PRs is just wrong :)

@@ -420,6 +420,7 @@
<ClCompile Include="..\Modules\blake2module.c">
<PreprocessorDefinitions Condition="'$(Platform)' == 'x64'">HACL_CAN_COMPILE_SIMD128;%(PreprocessorDefinitions)</PreprocessorDefinitions>
<PreprocessorDefinitions Condition="'$(Platform)' == 'x64'">HACL_CAN_COMPILE_SIMD256;%(PreprocessorDefinitions)</PreprocessorDefinitions>
<AdditionalOptions>/arch:AVX</AdditionalOptions>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be removed right? It's not always true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My clang-cl builds error without that and it does not hurt MSVC AFACT.
It is also used for Hacl_Hash_Blake2b_Simd256.c and Hacl_Hash_Blake2s_Simd128.c, but there with Condition="'$(Platform)' == 'x64'.

I think, I should add that condition here, too.

If in doubt about MSVC, I could also furthermore add Condition="$(PlatformToolset) == 'ClangCL'.

@@ -716,6 +717,18 @@
<Delete Files="$(IntDir)pyconfig.h;$(OutDir)pyconfig.h" />
</Target>

<Target Name="EnsureClangProfileDir" BeforeTargets="PrepareForBuild"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, we should not only ensure the profile dir, but empty it.

But most probably only when a full rebuild / clean is done.

@Fidget-Spinner
Copy link
Member

@chris-eibl the Windows build expert for CPython has given their approval provided we don't affect any of the standard build.bat options https://discuss.python.org/t/introducing-clang-cl-as-an-alternative-msbuild-backend/80078/13

@zooba
Copy link
Member

zooba commented Feb 10, 2025

I haven't looked through the details of how it actually works, but I approve of the approach to making changes. (Most contributors to the build files make things actively worse in some ways, but it looks like you've done a great job and avoided it!)

@chris-eibl
Copy link
Contributor Author

@zooba Maybe you can help me where and how to best place the cleaning of the clang profile dir?

Regarding building the _freeze_module twice in case of PGO builds (commit 26fb51f):

This is quite independent from this PR and I think I should create a separate PR (and issue), since the MSVC build will benefit from it, too?

@Fidget-Spinner
Copy link
Member

@chris-eibl do the Clean or CleanAll targets work?

@zooba
Copy link
Member

zooba commented Feb 10, 2025

The best way to handle cleaning is to add any generated files into the FileWrites item group when they're created. That should store them for cleaning automatically. <FileWrites Include="$(ClangProfileDir)\**\*" /> should do it. It doesn't always work though, so attaching it to a new target that has BeforeTargets="Clean" should be fine.

Avoiding the rebuild of _freeze_module and the refreezing would be nice, but it's not a huge deal (a few seconds in a ~hour long release process). Don't mix it in with this PR.

I've previously gotten compile errors from clang, because the needed
intrinsics were not available without that option.

Cannot reproduce anymore. Most probably, because I've upgraded to
Visual Studio 17.13.0 Preview 5.0, which now ships with clang 19.1.1
instead of 18.1.8 and they've done that for compatibility with MSVC?

Anyway, let's keep the PR small :)
This reverts commit 26fb51f.

Shall be done in a separate PR.
This better matches the behaviour of build.bat in case of MSVC PGO builds.
@chris-eibl
Copy link
Contributor Author

Since build.bat -t CleanAll does not clean anything in my build dir (not even object files), I went a different route.
Like build.bat --pgo always removes the MSVC profile data using

if "%do_pgo%"=="true" (
    del /s "%dir%\*.pgc"

I now always empty the profile dir. Seems anyway safer to me, so older data cannot kick in (although the names of those files have not yet changed for me - but better safe than sorry).

@zooba
Copy link
Member

zooba commented Feb 12, 2025

Like build.bat --pgo always removes the MSVC profile data using

Ah yeah. This is the same kind of cleanup as that, so you'll want to do it the same way. The MSBuild clean would delete critical files for the second stage.

@chris-eibl
Copy link
Contributor Author

chris-eibl commented Feb 13, 2025

So this seems ready to me to pull out of draft.
Going to create an issue and blurb it.
Find a reviewer. @zooba would you be willing to review?

In case there is more todo before converting to from draft please let me know.

IMHO, #130040 should be merged first, so we have CI coverage that the non-PGO clang-cl build is still fine after this PR.

PS: sorry @Fidget-Spinner, didn't want to already ask for a review. Happened accidentially when trying to find the "button" where I could convert from draft. Seems I cannot do that?
And can't undo my review request - most probably brain damage on my side?

@chris-eibl chris-eibl marked this pull request as ready for review February 13, 2025 06:26
@chris-eibl chris-eibl requested a review from a team as a code owner February 13, 2025 06:26
@chris-eibl
Copy link
Contributor Author

Ah - found the button to convert from draft. That automatically asked for a review from python/windowsteam.
For sure Steve is a member there.

Just cannot convert back to draft now. But I think it is okay anyway.

Will go to merge with main, etc.

Sorry for the churn, I am not yet fully acquainted with the procedure here - still learning ...

@Fidget-Spinner
Copy link
Member

Thanks! Can you open an issue and link it to this PR by adding it to the title? Something like gh-XXX: <Title> where XXX is the issue number.

We usually need issues for nontrivial changes in case we need to deliberate further.

Copy link
Member

@zooba zooba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More proposed changes than I expected, but it's mostly organisation :) I'm impressed it's actually this simple.

@chris-eibl
Copy link
Contributor Author

@zooba thank you so much for your suggestions.
Will start with pyproject-clangcl.props.
Seems to be a low hanging fruit and a nice cleanup :)

@chris-eibl
Copy link
Contributor Author

Works for me either way, just let's not break the build.
IMHO this should not be part of this PR and the discussion should be in #130213?

@chris-eibl
Copy link
Contributor Author

If put something together in readme.txt in 4ad2365. Since I am not a native speaker, there is for sure room for improvements :)

@zooba: PS: after so much help, can I mention you in Misc/NEWS, too?

@chris-eibl
Copy link
Contributor Author

Avoiding the rebuild of _freeze_module and the refreezing would be nice, but it's not a huge deal (a few seconds in a ~hour long release process). Don't mix it in with this PR.

I have created a separate issue #130419 and PR #130420 (clang-cl needs almost a minute on my dusty PC, MSVC is indeed much faster).

Copy link
Member

@Fidget-Spinner Fidget-Spinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some small comments for the instructions.

Copy link
Member

@zooba zooba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of readme suggestions, but I think we're nearly there!

All other build.bat options continue to work as with MSVC, so this
will create a 64bit release binary. PreferredToolArchitecture is needed,
because msbuild by default selects the 32bit architecture of the toolset,
which uses -m32 as the default target architecture.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is genuinely required, let's add a <PreferredToolArchitecture Condition="$(PROCESSOR_ARCHITECTURE) == 'AMD64'">x64</PreferredToolArchitecture> to the clangcl.props. I prefer to not override MSVC defaults for this property, but if Clang can't handle cross-compiling, then provided we know that it's been requested we may as well override defaults.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I don't know why I need it when building with msbuild. I do not need it in case of the IDE or when explicitely using a custom installed toolset (most probably because I've always installed 64bit versions of those). I like your suggestion and will try it out - looks promising!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks promising!

But did not work out: most probably, we're "too late" when setting that, meaning that these lines in Microsoft.Cpp.ClangCl.Common.props are already "resolved":

    <_DefaultLLVMInstallDir Condition="'$(_DefaultLLVMInstallDir)' == '' AND '$(PreferredToolArchitecture)' == 'arm64'">$(VsInstallRoot)\VC\Tools\Llvm\ARM64</_DefaultLLVMInstallDir>
    <_DefaultLLVMInstallDir Condition="'$(_DefaultLLVMInstallDir)' == '' AND '$(PreferredToolArchitecture)' == 'x64'">$(VsInstallRoot)\VC\Tools\Llvm\x64</_DefaultLLVMInstallDir>
    <_DefaultLLVMInstallDir Condition="'$(_DefaultLLVMInstallDir)' == '' AND '$(PreferredToolArchitecture)' != 'x64'">$(VsInstallRoot)\VC\Tools\Llvm</_DefaultLLVMInstallDir>
    <LLVMInstallDir Condition="'$(LLVMInstallDir)' == ''">$(_DefaultLLVMInstallDir)</LLVMInstallDir>

Anyway, using the PreferredToolArchitecture as a way to steer the architecture of the generated binaries was a bad idea. So to get a 32bit binary, one would have to use "/p:PreferredToolArchitecture=x86" :-O

clang-cl of course can cross-compile, so let's explicitely set the architecture based on $(Platform).

Comment on lines +84 to +86
"/p:CLANG_PROFILE_PATH=<relative-path-to-instrumented-dir-on-remote-host>"
in the PGInstrument step to make sure the profile data is generated
into the instrumented directory when running the PGO task.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs a worked example. Just from this description, I couldn't confidently specify the parameter and trust that I got it right.

@@ -39,6 +39,8 @@
<ItemDefinitionGroup>
<ClCompile>
<AdditionalOptions>-Wno-deprecated-non-prototype -Wno-unused-label -Wno-pointer-sign -Wno-incompatible-pointer-types-discards-qualifiers -Wno-unused-function %(AdditionalOptions)</AdditionalOptions>
<AdditionalOptions Condition="'$(Platform)' == 'Win32'">-m32 %(AdditionalOptions)</AdditionalOptions>
<AdditionalOptions Condition="'$(Platform)' == 'x64'">-m64 %(AdditionalOptions)</AdditionalOptions>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an ARM64 option as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad you found this option, though! Much neater way to do it (and a bit embarrassing that the built-in targets don't do it...).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an ARM64 option as well?

I think so, yes. I see some hits in the net, and did a clang-cl /?, which e.g. reveals option /arm64EC.

But I do not know what to actually write here in case of '$(Platform)'=='ARM64' - and more importantly how to test it, since I have no arm based device.

I think we should leave that for another PR of someone who does?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. @brandtbucher might be that someone. But at the conditions here will just not set an option on that platform and so it should work the same as today.

Co-authored-by: Steve Dower <[email protected]>
@zooba
Copy link
Member

zooba commented Mar 4, 2025

Okay, let's take it! It's not going to hold up any releases, so if it needs fixing then people will find it as they're testing.

@zooba zooba merged commit d8a1cf4 into python:main Mar 4, 2025
39 checks passed
@zanieb
Copy link
Contributor

zanieb commented Mar 4, 2025

Can you share some numbers on the statements:

  • running the instrumented pgo tests is much faster
  • the PGO step is faster

@chris-eibl
Copy link
Contributor Author

Oh sorry, I don't have hard numbers for that - I've never measured with the stop watch during watching the PGInstrument / gather profile data / PGUpdate steps, i.e. the build step.

I just see the "pgo task" -m test --pgo being much faster when gathering the profile data - no stop watch needed to "feel" the difference.

Maybe this can be seen in the CI build logs - there is an option to enable timestamps for the log entries.

If you are interested in pyperformance benchmarks of the built binaries, though, please see the issue and the links to my github gist where I have plenty of data.

@chris-eibl
Copy link
Contributor Author

chris-eibl commented Mar 4, 2025

msbuild at the end of the PGInstrument and PGUpdate build step does report numbers, though, but I have not noted them for the various builds I've done - I was just too much interested in the pyperformance data ...

@mdboom
Copy link
Contributor

mdboom commented Mar 5, 2025

I just benchmarked this on the Faster CPython team's benchmarking hardware, which has the Visual Studio preview channel, which is currently:

  • MSVC 19.43.34808
  • clang 19.1 (installed through the clangcl module in the Visual Studio installer)

The build with clang+pgo is about 19% faster overall vs. msvc+pgo. Notably, however, some benchmarks clearly have a major regression, so it's not a "win across the board". And of course, I'm only talking about performance and not all of the other myriad of issues that come with changing the compiler.

image

@chris-eibl
Copy link
Contributor Author

chris-eibl commented Mar 5, 2025

Thanks @mdboom for your results! Always glad to see other benchmarks. This fits my results presented on https://gist.github.com/chris-eibl/114a42f22563956fdb5cd0335b28c7ae, too. If I compare the msvc.pgo.9db1a297d9 (Microsoft Visual Studio 2022 17.13.0 Preview 5.0) and clang.pgo.9db1a297d9 (clang-cl 19.1.1) directly, I end up with an even smaller speedup between those two:

+----------------------------------+---------------------+------------------------+
| Benchmark                        | msvc.pgo.9db1a297d9 | clang.pgo.9db1a297d9   |
+==================================+=====================+========================
+----------------------------------+---------------------+------------------------+
| Geometric mean                   | (ref)               | 1.15x faster           |
+----------------------------------+---------------------+------------------------+

And like in your results above, create_gc_cycles and gc_traversal are the clear losers. Very curious why that might be the case. Same for subparsers and many_optionals, which aren't part of the pyperformance suite 1.11.0 I use (the latest released version on pypi) - most probably you are using the latest stable version from github?

Like I've written on dpo:
I am not pushing on anything here. I just want to give people who are eager to try clang-cl the opportunity to try PGO as well :)

Interestingly, clang 18.1.8 is faster than 19.1.1. Tailcalling always gains speed.

Details

+----------------------------------+---------------------+------------------------+
| Benchmark                        | msvc.pgo.9db1a297d9 | clang.pgo.9db1a297d9   |
+==================================+=====================+========================+
| pickle_dict                      | 43.2 us             | 27.6 us: 1.56x faster  |
+----------------------------------+---------------------+------------------------+
| unpack_sequence                  | 84.8 ns             | 59.3 ns: 1.43x faster  |
+----------------------------------+---------------------+------------------------+
| logging_silent                   | 152 ns              | 109 ns: 1.39x faster   |
+----------------------------------+---------------------+------------------------+
| scimark_fft                      | 493 ms              | 358 ms: 1.38x faster   |
+----------------------------------+---------------------+------------------------+
| spectral_norm                    | 151 ms              | 110 ms: 1.37x faster   |
+----------------------------------+---------------------+------------------------+
| pickle_list                      | 6.89 us             | 5.05 us: 1.37x faster  |
+----------------------------------+---------------------+------------------------+
| scimark_monte_carlo              | 101 ms              | 74.6 ms: 1.35x faster  |
+----------------------------------+---------------------+------------------------+
| coroutines                       | 36.1 ms             | 26.9 ms: 1.34x faster  |
+----------------------------------+---------------------+------------------------+
| deepcopy_memo                    | 46.8 us             | 34.8 us: 1.34x faster  |
+----------------------------------+---------------------+------------------------+
| nbody                            | 171 ms              | 128 ms: 1.33x faster   |
+----------------------------------+---------------------+------------------------+
| richards_super                   | 74.7 ms             | 56.2 ms: 1.33x faster  |
+----------------------------------+---------------------+------------------------+
| comprehensions                   | 25.2 us             | 19.2 us: 1.31x faster  |
+----------------------------------+---------------------+------------------------+
| unpickle_pure_python             | 336 us              | 257 us: 1.31x faster   |
+----------------------------------+---------------------+------------------------+
| richards                         | 64.7 ms             | 49.7 ms: 1.30x faster  |
+----------------------------------+---------------------+------------------------+
| hexiom                           | 9.22 ms             | 7.11 ms: 1.30x faster  |
+----------------------------------+---------------------+------------------------+
| deltablue                        | 4.92 ms             | 3.80 ms: 1.30x faster  |
+----------------------------------+---------------------+------------------------+
| go                               | 170 ms              | 132 ms: 1.29x faster   |
+----------------------------------+---------------------+------------------------+
| raytrace                         | 414 ms              | 321 ms: 1.29x faster   |
+----------------------------------+---------------------+------------------------+
| scimark_sor                      | 195 ms              | 151 ms: 1.29x faster   |
+----------------------------------+---------------------+------------------------+
| pickle                           | 19.1 us             | 15.0 us: 1.28x faster  |
+----------------------------------+---------------------+------------------------+
| unpickle_list                    | 6.87 us             | 5.38 us: 1.28x faster  |
+----------------------------------+---------------------+------------------------+
| nqueens                          | 131 ms              | 103 ms: 1.27x faster   |
+----------------------------------+---------------------+------------------------+
| crypto_pyaes                     | 109 ms              | 86.3 ms: 1.26x faster  |
+----------------------------------+---------------------+------------------------+
| deepcopy                         | 388 us              | 309 us: 1.26x faster   |
+----------------------------------+---------------------+------------------------+
| pyflate                          | 668 ms              | 537 ms: 1.24x faster   |
+----------------------------------+---------------------+------------------------+
| scimark_lu                       | 164 ms              | 132 ms: 1.24x faster   |
+----------------------------------+---------------------+------------------------+
| django_template                  | 52.1 ms             | 42.1 ms: 1.24x faster  |
+----------------------------------+---------------------+------------------------+
| genshi_text                      | 32.5 ms             | 26.3 ms: 1.24x faster  |
+----------------------------------+---------------------+------------------------+
| fannkuch                         | 637 ms              | 516 ms: 1.23x faster   |
+----------------------------------+---------------------+------------------------+
| generators                       | 44.4 ms             | 36.0 ms: 1.23x faster  |
+----------------------------------+---------------------+------------------------+
| pickle_pure_python               | 463 us              | 378 us: 1.22x faster   |
+----------------------------------+---------------------+------------------------+
| chaos                            | 90.8 ms             | 74.3 ms: 1.22x faster  |
+----------------------------------+---------------------+------------------------+
| deepcopy_reduce                  | 3.95 us             | 3.23 us: 1.22x faster  |
+----------------------------------+---------------------+------------------------+
| scimark_sparse_mat_mult          | 6.06 ms             | 5.01 ms: 1.21x faster  |
+----------------------------------+---------------------+------------------------+
| tomli_loads                      | 2.88 sec            | 2.38 sec: 1.21x faster |
+----------------------------------+---------------------+------------------------+
| float                            | 116 ms              | 96.8 ms: 1.20x faster  |
+----------------------------------+---------------------+------------------------+
| async_tree_eager                 | 160 ms              | 133 ms: 1.20x faster   |
+----------------------------------+---------------------+------------------------+
| sqlglot_parse                    | 1.81 ms             | 1.51 ms: 1.20x faster  |
+----------------------------------+---------------------+------------------------+
| sqlglot_transpile                | 2.21 ms             | 1.85 ms: 1.19x faster  |
+----------------------------------+---------------------+------------------------+
| genshi_xml                       | 74.6 ms             | 63.1 ms: 1.18x faster  |
+----------------------------------+---------------------+------------------------+
| pprint_safe_repr                 | 1.09 sec            | 934 ms: 1.17x faster   |
+----------------------------------+---------------------+------------------------+
| pprint_pformat                   | 2.23 sec            | 1.91 sec: 1.17x faster |
+----------------------------------+---------------------+------------------------+
| json_dumps                       | 15.0 ms             | 12.9 ms: 1.16x faster  |
+----------------------------------+---------------------+------------------------+
| typing_runtime_protocols         | 223 us              | 193 us: 1.16x faster   |
+----------------------------------+---------------------+------------------------+
| coverage                         | 120 ms              | 103 ms: 1.16x faster   |
+----------------------------------+---------------------+------------------------+
| mako                             | 16.7 ms             | 14.4 ms: 1.16x faster  |
+----------------------------------+---------------------+------------------------+
| xml_etree_process                | 94.4 ms             | 82.0 ms: 1.15x faster  |
+----------------------------------+---------------------+------------------------+
| regex_compile                    | 180 ms              | 157 ms: 1.15x faster   |
+----------------------------------+---------------------+------------------------+
| telco                            | 10.7 ms             | 9.37 ms: 1.15x faster  |
+----------------------------------+---------------------+------------------------+
| sqlglot_normalize                | 151 ms              | 131 ms: 1.15x faster   |
+----------------------------------+---------------------+------------------------+
| xml_etree_generate               | 135 ms              | 119 ms: 1.13x faster   |
+----------------------------------+---------------------+------------------------+
| regex_v8                         | 33.7 ms             | 29.8 ms: 1.13x faster  |
+----------------------------------+---------------------+------------------------+
| sqlglot_optimize                 | 74.5 ms             | 66.2 ms: 1.13x faster  |
+----------------------------------+---------------------+------------------------+
| json_loads                       | 36.8 us             | 32.7 us: 1.13x faster  |
+----------------------------------+---------------------+------------------------+
| meteor_contest                   | 139 ms              | 124 ms: 1.13x faster   |
+----------------------------------+---------------------+------------------------+
| async_generators                 | 577 ms              | 514 ms: 1.12x faster   |
+----------------------------------+---------------------+------------------------+
| sympy_integrate                  | 27.1 ms             | 24.2 ms: 1.12x faster  |
+----------------------------------+---------------------+------------------------+
| mdp                              | 3.76 sec            | 3.37 sec: 1.12x faster |
+----------------------------------+---------------------+------------------------+
| sympy_str                        | 383 ms              | 344 ms: 1.11x faster   |
+----------------------------------+---------------------+------------------------+
| async_tree_memoization           | 509 ms              | 458 ms: 1.11x faster   |
+----------------------------------+---------------------+------------------------+
| sympy_expand                     | 640 ms              | 578 ms: 1.11x faster   |
+----------------------------------+---------------------+------------------------+
| unpickle                         | 19.8 us             | 17.9 us: 1.11x faster  |
+----------------------------------+---------------------+------------------------+
| logging_simple                   | 13.5 us             | 12.2 us: 1.11x faster  |
+----------------------------------+---------------------+------------------------+
| async_tree_none                  | 394 ms              | 357 ms: 1.10x faster   |
+----------------------------------+---------------------+------------------------+
| sqlite_synth                     | 3.75 us             | 3.44 us: 1.09x faster  |
+----------------------------------+---------------------+------------------------+
| async_tree_memoization_tg        | 462 ms              | 425 ms: 1.09x faster   |
+----------------------------------+---------------------+------------------------+
| async_tree_io_tg                 | 877 ms              | 807 ms: 1.09x faster   |
+----------------------------------+---------------------+------------------------+
| logging_format                   | 14.7 us             | 13.6 us: 1.09x faster  |
+----------------------------------+---------------------+------------------------+
| async_tree_none_tg               | 382 ms              | 352 ms: 1.09x faster   |
+----------------------------------+---------------------+------------------------+
| async_tree_eager_memoization     | 304 ms              | 281 ms: 1.08x faster   |
+----------------------------------+---------------------+------------------------+
| async_tree_eager_tg              | 321 ms              | 297 ms: 1.08x faster   |
+----------------------------------+---------------------+------------------------+
| 2to3                             | 462 ms              | 426 ms: 1.08x faster   |
+----------------------------------+---------------------+------------------------+
| regex_effbot                     | 3.66 ms             | 3.39 ms: 1.08x faster  |
+----------------------------------+---------------------+------------------------+
| async_tree_io                    | 889 ms              | 824 ms: 1.08x faster   |
+----------------------------------+---------------------+------------------------+
| async_tree_cpu_io_mixed_tg       | 716 ms              | 665 ms: 1.08x faster   |
+----------------------------------+---------------------+------------------------+
| async_tree_eager_memoization_tg  | 427 ms              | 397 ms: 1.07x faster   |
+----------------------------------+---------------------+------------------------+
| async_tree_cpu_io_mixed          | 749 ms              | 697 ms: 1.07x faster   |
+----------------------------------+---------------------+------------------------+
| async_tree_eager_io              | 874 ms              | 817 ms: 1.07x faster   |
+----------------------------------+---------------------+------------------------+
| sympy_sum                        | 213 ms              | 199 ms: 1.07x faster   |
+----------------------------------+---------------------+------------------------+
| async_tree_eager_io_tg           | 898 ms              | 840 ms: 1.07x faster   |
+----------------------------------+---------------------+------------------------+
| async_tree_eager_cpu_io_mixed    | 567 ms              | 535 ms: 1.06x faster   |
+----------------------------------+---------------------+------------------------+
| docutils                         | 3.50 sec            | 3.31 sec: 1.06x faster |
+----------------------------------+---------------------+------------------------+
| xml_etree_iterparse              | 154 ms              | 145 ms: 1.06x faster   |
+----------------------------------+---------------------+------------------------+
| async_tree_eager_cpu_io_mixed_tg | 681 ms              | 646 ms: 1.05x faster   |
+----------------------------------+---------------------+------------------------+
| html5lib                         | 77.9 ms             | 74.5 ms: 1.05x faster  |
+----------------------------------+---------------------+------------------------+
| pidigits                         | 250 ms              | 240 ms: 1.04x faster   |
+----------------------------------+---------------------+------------------------+
| bench_thread_pool                | 1.68 ms             | 1.63 ms: 1.03x faster  |
+----------------------------------+---------------------+------------------------+
| asyncio_websockets               | 758 ms              | 740 ms: 1.02x faster   |
+----------------------------------+---------------------+------------------------+
| python_startup_no_site           | 35.4 ms             | 35.9 ms: 1.01x slower  |
+----------------------------------+---------------------+------------------------+
| pathlib                          | 256 ms              | 262 ms: 1.02x slower   |
+----------------------------------+---------------------+------------------------+
| xml_etree_parse                  | 200 ms              | 210 ms: 1.05x slower   |
+----------------------------------+---------------------+------------------------+
| bench_mp_pool                    | 177 ms              | 190 ms: 1.07x slower   |
+----------------------------------+---------------------+------------------------+
| asyncio_tcp                      | 1.48 sec            | 1.61 sec: 1.09x slower |
+----------------------------------+---------------------+------------------------+
| create_gc_cycles                 | 1.56 ms             | 1.71 ms: 1.10x slower  |
+----------------------------------+---------------------+------------------------+
| gc_traversal                     | 4.02 ms             | 5.71 ms: 1.42x slower  |
+----------------------------------+---------------------+------------------------+
| Geometric mean                   | (ref)               | 1.15x faster           |
+----------------------------------+---------------------+------------------------+

@chris-eibl
Copy link
Contributor Author

Like currently discussed in #128718, clang 18.1.8 is faster than 19.1.1, and 20.1.0.rc2 with tailcalling is the fastest:

Benchmark msvc.pgo.9db1a297d9 clang.pgo.18.1.8.9db1a297d9 clang.pgo.9db1a297d9 clang.pgo.tc.20.1.0.rc2.9db1a297d9
Geometric mean (ref) 1.19x faster 1.15x faster 1.25x faster

But even for 20.1.0.rc2 with PGO+tailcalling, gc_traversal is 1.31x slower than MSVC PGO:

Details

+----------------------------------+---------------------+------------------------------------+
| Benchmark                        | msvc.pgo.9db1a297d9 | clang.pgo.tc.20.1.0.rc2.9db1a297d9 |
+==================================+=====================+====================================+
| scimark_fft                      | 493 ms              | 315 ms: 1.56x faster               |
+----------------------------------+---------------------+------------------------------------+
| scimark_sor                      | 195 ms              | 126 ms: 1.55x faster               |
+----------------------------------+---------------------+------------------------------------+
| pickle_dict                      | 43.2 us             | 27.9 us: 1.55x faster              |
+----------------------------------+---------------------+------------------------------------+
| spectral_norm                    | 151 ms              | 99.4 ms: 1.51x faster              |
+----------------------------------+---------------------+------------------------------------+
| logging_silent                   | 152 ns              | 101 ns: 1.51x faster               |
+----------------------------------+---------------------+------------------------------------+
| unpack_sequence                  | 84.8 ns             | 57.2 ns: 1.48x faster              |
+----------------------------------+---------------------+------------------------------------+
| deltablue                        | 4.92 ms             | 3.40 ms: 1.45x faster              |
+----------------------------------+---------------------+------------------------------------+
| coroutines                       | 36.1 ms             | 24.9 ms: 1.45x faster              |
+----------------------------------+---------------------+------------------------------------+
| nbody                            | 171 ms              | 118 ms: 1.45x faster               |
+----------------------------------+---------------------+------------------------------------+
| scimark_monte_carlo              | 101 ms              | 69.6 ms: 1.45x faster              |
+----------------------------------+---------------------+------------------------------------+
| comprehensions                   | 25.2 us             | 17.6 us: 1.43x faster              |
+----------------------------------+---------------------+------------------------------------+
| unpickle_pure_python             | 336 us              | 235 us: 1.43x faster               |
+----------------------------------+---------------------+------------------------------------+
| richards_super                   | 74.7 ms             | 52.2 ms: 1.43x faster              |
+----------------------------------+---------------------+------------------------------------+
| nqueens                          | 131 ms              | 92.0 ms: 1.42x faster              |
+----------------------------------+---------------------+------------------------------------+
| deepcopy_memo                    | 46.8 us             | 33.2 us: 1.41x faster              |
+----------------------------------+---------------------+------------------------------------+
| raytrace                         | 414 ms              | 294 ms: 1.41x faster               |
+----------------------------------+---------------------+------------------------------------+
| richards                         | 64.7 ms             | 45.9 ms: 1.41x faster              |
+----------------------------------+---------------------+------------------------------------+
| fannkuch                         | 637 ms              | 456 ms: 1.40x faster               |
+----------------------------------+---------------------+------------------------------------+
| go                               | 170 ms              | 122 ms: 1.40x faster               |
+----------------------------------+---------------------+------------------------------------+
| crypto_pyaes                     | 109 ms              | 78.0 ms: 1.40x faster              |
+----------------------------------+---------------------+------------------------------------+
| pickle_list                      | 6.89 us             | 4.94 us: 1.40x faster              |
+----------------------------------+---------------------+------------------------------------+
| hexiom                           | 9.22 ms             | 6.61 ms: 1.39x faster              |
+----------------------------------+---------------------+------------------------------------+
| chaos                            | 90.8 ms             | 65.4 ms: 1.39x faster              |
+----------------------------------+---------------------+------------------------------------+
| scimark_lu                       | 164 ms              | 119 ms: 1.38x faster               |
+----------------------------------+---------------------+------------------------------------+
| tomli_loads                      | 2.88 sec            | 2.13 sec: 1.36x faster             |
+----------------------------------+---------------------+------------------------------------+
| unpickle_list                    | 6.87 us             | 5.08 us: 1.35x faster              |
+----------------------------------+---------------------+------------------------------------+
| generators                       | 44.4 ms             | 33.0 ms: 1.35x faster              |
+----------------------------------+---------------------+------------------------------------+
| pprint_safe_repr                 | 1.09 sec            | 812 ms: 1.35x faster               |
+----------------------------------+---------------------+------------------------------------+
| deepcopy                         | 388 us              | 289 us: 1.35x faster               |
+----------------------------------+---------------------+------------------------------------+
| genshi_text                      | 32.5 ms             | 24.2 ms: 1.34x faster              |
+----------------------------------+---------------------+------------------------------------+
| pprint_pformat                   | 2.23 sec            | 1.66 sec: 1.34x faster             |
+----------------------------------+---------------------+------------------------------------+
| pickle                           | 19.1 us             | 14.3 us: 1.34x faster              |
+----------------------------------+---------------------+------------------------------------+
| sqlglot_parse                    | 1.81 ms             | 1.35 ms: 1.34x faster              |
+----------------------------------+---------------------+------------------------------------+
| django_template                  | 52.1 ms             | 39.0 ms: 1.34x faster              |
+----------------------------------+---------------------+------------------------------------+
| float                            | 116 ms              | 87.6 ms: 1.33x faster              |
+----------------------------------+---------------------+------------------------------------+
| pickle_pure_python               | 463 us              | 350 us: 1.32x faster               |
+----------------------------------+---------------------+------------------------------------+
| pyflate                          | 668 ms              | 506 ms: 1.32x faster               |
+----------------------------------+---------------------+------------------------------------+
| async_tree_eager                 | 160 ms              | 121 ms: 1.32x faster               |
+----------------------------------+---------------------+------------------------------------+
| regex_compile                    | 180 ms              | 137 ms: 1.32x faster               |
+----------------------------------+---------------------+------------------------------------+
| json_dumps                       | 15.0 ms             | 11.5 ms: 1.30x faster              |
+----------------------------------+---------------------+------------------------------------+
| genshi_xml                       | 74.6 ms             | 57.2 ms: 1.30x faster              |
+----------------------------------+---------------------+------------------------------------+
| scimark_sparse_mat_mult          | 6.06 ms             | 4.64 ms: 1.30x faster              |
+----------------------------------+---------------------+------------------------------------+
| sqlglot_transpile                | 2.21 ms             | 1.70 ms: 1.30x faster              |
+----------------------------------+---------------------+------------------------------------+
| mdp                              | 3.76 sec            | 2.91 sec: 1.29x faster             |
+----------------------------------+---------------------+------------------------------------+
| logging_simple                   | 13.5 us             | 10.4 us: 1.29x faster              |
+----------------------------------+---------------------+------------------------------------+
| coverage                         | 120 ms              | 93.3 ms: 1.28x faster              |
+----------------------------------+---------------------+------------------------------------+
| xml_etree_process                | 94.4 ms             | 73.6 ms: 1.28x faster              |
+----------------------------------+---------------------+------------------------------------+
| deepcopy_reduce                  | 3.95 us             | 3.08 us: 1.28x faster              |
+----------------------------------+---------------------+------------------------------------+
| typing_runtime_protocols         | 223 us              | 175 us: 1.27x faster               |
+----------------------------------+---------------------+------------------------------------+
| logging_format                   | 14.7 us             | 11.6 us: 1.27x faster              |
+----------------------------------+---------------------+------------------------------------+
| mako                             | 16.7 ms             | 13.2 ms: 1.26x faster              |
+----------------------------------+---------------------+------------------------------------+
| sqlglot_normalize                | 151 ms              | 119 ms: 1.26x faster               |
+----------------------------------+---------------------+------------------------------------+
| xml_etree_generate               | 135 ms              | 109 ms: 1.24x faster               |
+----------------------------------+---------------------+------------------------------------+
| sqlglot_optimize                 | 74.5 ms             | 60.0 ms: 1.24x faster              |
+----------------------------------+---------------------+------------------------------------+
| sympy_expand                     | 640 ms              | 518 ms: 1.24x faster               |
+----------------------------------+---------------------+------------------------------------+
| telco                            | 10.7 ms             | 8.70 ms: 1.24x faster              |
+----------------------------------+---------------------+------------------------------------+
| async_tree_eager_memoization     | 304 ms              | 246 ms: 1.24x faster               |
+----------------------------------+---------------------+------------------------------------+
| async_generators                 | 577 ms              | 469 ms: 1.23x faster               |
+----------------------------------+---------------------+------------------------------------+
| sympy_str                        | 383 ms              | 315 ms: 1.22x faster               |
+----------------------------------+---------------------+------------------------------------+
| sympy_integrate                  | 27.1 ms             | 22.4 ms: 1.21x faster              |
+----------------------------------+---------------------+------------------------------------+
| async_tree_memoization           | 509 ms              | 423 ms: 1.20x faster               |
+----------------------------------+---------------------+------------------------------------+
| json_loads                       | 36.8 us             | 30.5 us: 1.20x faster              |
+----------------------------------+---------------------+------------------------------------+
| async_tree_none                  | 394 ms              | 330 ms: 1.19x faster               |
+----------------------------------+---------------------+------------------------------------+
| meteor_contest                   | 139 ms              | 117 ms: 1.19x faster               |
+----------------------------------+---------------------+------------------------------------+
| async_tree_eager_memoization_tg  | 427 ms              | 363 ms: 1.18x faster               |
+----------------------------------+---------------------+------------------------------------+
| sympy_sum                        | 213 ms              | 181 ms: 1.18x faster               |
+----------------------------------+---------------------+------------------------------------+
| async_tree_eager_io              | 874 ms              | 743 ms: 1.18x faster               |
+----------------------------------+---------------------+------------------------------------+
| regex_v8                         | 33.7 ms             | 28.8 ms: 1.17x faster              |
+----------------------------------+---------------------+------------------------------------+
| unpickle                         | 19.8 us             | 16.9 us: 1.17x faster              |
+----------------------------------+---------------------+------------------------------------+
| async_tree_eager_tg              | 321 ms              | 275 ms: 1.17x faster               |
+----------------------------------+---------------------+------------------------------------+
| async_tree_io_tg                 | 877 ms              | 752 ms: 1.17x faster               |
+----------------------------------+---------------------+------------------------------------+
| async_tree_memoization_tg        | 462 ms              | 396 ms: 1.17x faster               |
+----------------------------------+---------------------+------------------------------------+
| async_tree_io                    | 889 ms              | 766 ms: 1.16x faster               |
+----------------------------------+---------------------+------------------------------------+
| 2to3                             | 462 ms              | 398 ms: 1.16x faster               |
+----------------------------------+---------------------+------------------------------------+
| async_tree_none_tg               | 382 ms              | 329 ms: 1.16x faster               |
+----------------------------------+---------------------+------------------------------------+
| async_tree_eager_cpu_io_mixed    | 567 ms              | 492 ms: 1.15x faster               |
+----------------------------------+---------------------+------------------------------------+
| sqlite_synth                     | 3.75 us             | 3.26 us: 1.15x faster              |
+----------------------------------+---------------------+------------------------------------+
| async_tree_eager_io_tg           | 898 ms              | 782 ms: 1.15x faster               |
+----------------------------------+---------------------+------------------------------------+
| async_tree_cpu_io_mixed          | 749 ms              | 652 ms: 1.15x faster               |
+----------------------------------+---------------------+------------------------------------+
| regex_effbot                     | 3.66 ms             | 3.20 ms: 1.14x faster              |
+----------------------------------+---------------------+------------------------------------+
| async_tree_eager_cpu_io_mixed_tg | 681 ms              | 596 ms: 1.14x faster               |
+----------------------------------+---------------------+------------------------------------+
| html5lib                         | 77.9 ms             | 68.6 ms: 1.14x faster              |
+----------------------------------+---------------------+------------------------------------+
| async_tree_cpu_io_mixed_tg       | 716 ms              | 630 ms: 1.14x faster               |
+----------------------------------+---------------------+------------------------------------+
| docutils                         | 3.50 sec            | 3.12 sec: 1.12x faster             |
+----------------------------------+---------------------+------------------------------------+
| xml_etree_iterparse              | 154 ms              | 140 ms: 1.10x faster               |
+----------------------------------+---------------------+------------------------------------+
| pidigits                         | 250 ms              | 233 ms: 1.07x faster               |
+----------------------------------+---------------------+------------------------------------+
| bench_thread_pool                | 1.68 ms             | 1.57 ms: 1.07x faster              |
+----------------------------------+---------------------+------------------------------------+
| asyncio_tcp                      | 1.48 sec            | 1.40 sec: 1.05x faster             |
+----------------------------------+---------------------+------------------------------------+
| dulwich_log                      | 129 ms              | 123 ms: 1.05x faster               |
+----------------------------------+---------------------+------------------------------------+
| asyncio_websockets               | 758 ms              | 723 ms: 1.05x faster               |
+----------------------------------+---------------------+------------------------------------+
| bench_mp_pool                    | 177 ms              | 171 ms: 1.04x faster               |
+----------------------------------+---------------------+------------------------------------+
| pathlib                          | 256 ms              | 250 ms: 1.02x faster               |
+----------------------------------+---------------------+------------------------------------+
| regex_dna                        | 210 ms              | 207 ms: 1.02x faster               |
+----------------------------------+---------------------+------------------------------------+
| gc_traversal                     | 4.02 ms             | 5.28 ms: 1.31x slower              |
+----------------------------------+---------------------+------------------------------------+
| Geometric mean                   | (ref)               | 1.25x faster                       |
+----------------------------------+---------------------+------------------------------------+

@zooba
Copy link
Member

zooba commented Mar 7, 2025

@chris-eibl You might be up for this as a project - we're incredibly unlikely to switch the default compiler for our releases, due to other compatibility concerns (not least that many users are explicitly using MSVC for builds/extensions, and would be massively inconvenienced, especially since clang looks very much like a moving target right now).

But there's some possibility that we might be able to use Clang just for the interpreter loop and continue using MSVC for public interfaces (which goes as deep as the entire CRT and our custom memory allocators, IMHO, unless you can somehow prove that Clang does and will always produce identical behaviour to MSVC).

I'm thinking of a single static library project just for the interpreter loop compiled with Clang, which we then link in using MSVC rather than compiling it directly. There are a lot of challenges, but it's going to be the only way (other than persuading MSVC to improve their optimisations, which we're already doing) to get any of these benefits by default any time soon. If you're up for it, I don't know that anyone else is actively working on it right now, so go for it!

There might be an existing issue somewhere - I've only had high-level discussions about it - but if you can't find one then start a new one.

@chris-eibl
Copy link
Contributor Author

@chris-eibl You might be up for this as a project

Oh, I think this might be a massive community effort, but yeah, someone somewhere has to start it somehow :)

we're incredibly unlikely to switch the default compiler for our releases, due to other compatibility concerns (not least that many

Yeah, I immediately agreed with you on that part on dpo - and still do.

Switching to clang-cl for the official builds is a far too high risk, since the user base is so big - and the compatibilty is too undetermined.

If e.g. @zanieb lands astral-sh/python-build-standalone#549, we'd get feedback regarding compatibility issues and a much smaller user base is affected.

But there's some possibility that we might be able to use Clang just for the interpreter loop and continue using MSVC for public interfaces

This is an interesting approach and might help us with compatibility concerns. I remember another project did something similar to get computed gotos (or the like) into an MSVC build using MingW or clang. AFAIR, they built some *.c (or just one?) files with one compiler and the rest with MSVC. But PGO and LTO are then most certainly problematic.

(which goes as deep as the entire CRT and our custom memory allocators, IMHO

clang-cl just "shims" the compiler and linker. It #includes the MSVC headers, produces compatible libs and links with the MSVC runtime. I.e., it does not bring its own allocator and the like.

What at least the pyperformance tests show: the clang-cl artifacts work with MSVC compiled extensions, since at least some of them come pre-compiled from pypi (e.g. _psutil_windows.pyd). And when pip had to compile locally for me, it used MSVC to build the extension.

unless you can somehow prove that Clang does and will always produce identical behaviour to MSVC

Most probably, the clang team would consider this as a bug and try to fix it. I can't speak for them (and maybe I am totally wrong here), but IMHO we cannot risk that dependency without having any measure about the compatibily wrt to the Python user base. There is https://clang.llvm.org/docs/MSVCCompatibility.html, all the C++ parts there have no effect on Python.

However, just because it compiles and links doesn't mean all extensions out there continue to work without issues.

I think we'll have to do baby steps here and support early adopters by fixing issues they run into - as long as this does not produce too much churn in the code base.

other than persuading MSVC to improve their optimisations, which we're already doing

Very much appreciated!

There might be an existing issue somewhere - I've only had high-level discussions about it - but if you can't find one then start a new one.

Yeah, that discussion needs to happen in a separate issue, not this PR. I did a is:issue state:open clang-cl search but it did not reveal anything suitable.

Maybe you and / or @Fidget-Spinner want to be a patron(s) of it, since he fancied to use clang-cl for the official binaries?

OTOH, maybe we'd first start on dpo to find out, what early adopters need to get started?

Building with clang-cl IMHO is now pretty well supported.

Once we have #130040, there is at least one CI schedule that will help us to not break clang-cl builds.

@chris-eibl
Copy link
Contributor Author

Can you share some numbers on the statements:

  • running the instrumented pgo tests is much faster
  • the PGO step is faster

See #131005 for some numbers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants