-
-
Notifications
You must be signed in to change notification settings - Fork 31.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use -flto=thin for clang-cl on Windows #131005
base: main
Are you sure you want to change the base?
Conversation
for _freeze_module in case of clang-cl to speed up the build
Speeds up both MSVC and clang-cl builds. Should most probably done in a separate PR and issue, though.
I've previously gotten compile errors from clang, because the needed intrinsics were not available without that option. Cannot reproduce anymore. Most probably, because I've upgraded to Visual Studio 17.13.0 Preview 5.0, which now ships with clang 19.1.1 instead of 18.1.8 and they've done that for compatibility with MSVC? Anyway, let's keep the PR small :)
This reverts commit 26fb51f. Shall be done in a separate PR.
This better matches the behaviour of build.bat in case of MSVC PGO builds.
and make it a target with inputs and outputs
because the name is too MSVC specific
Co-authored-by: Steve Dower <[email protected]>
@@ -23,7 +23,7 @@ extern "C" { | |||
declaration \ | |||
_GENERATE_DEBUG_SECTION_LINUX(name) | |||
|
|||
#if defined(MS_WINDOWS) | |||
#if defined(MS_WINDOWS)&& !defined(__clang__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly, for debug builds I now need this, too. Never needed before, see #130040 (comment). Will anyway come with that PR...
@@ -420,6 +420,7 @@ | |||
<ClCompile Include="..\Modules\blake2module.c"> | |||
<PreprocessorDefinitions Condition="'$(Platform)' == 'x64'">HACL_CAN_COMPILE_SIMD128;%(PreprocessorDefinitions)</PreprocessorDefinitions> | |||
<PreprocessorDefinitions Condition="'$(Platform)' == 'x64'">HACL_CAN_COMPILE_SIMD256;%(PreprocessorDefinitions)</PreprocessorDefinitions> | |||
<AdditionalOptions Condition="'$(Platform)' == 'x64' and '$(LLVMToolsVersion)' < '19'">/arch:AVX</AdditionalOptions> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -0,0 +1,264 @@ | |||
import argparse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a beauty but it creates the tables for me :)
@@ -167,6 +167,30 @@ public override bool Execute() { | |||
</Task> | |||
</UsingTask> | |||
|
|||
<Target Name="BeginTimeStamp" BeforeTargets="PrepareForBuild"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only needed for detailed timings per project. Could be guarded behind a Condition="'$(PrintBuildTimeStamps)' == 'true'"
, in case this shall be merged.
from datetime import datetime, date, time | ||
|
||
# Verstrichene Zeit 00:00:00.74 | ||
msbuild_time_str = "Verstrichene Zeit" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the thing I dislike most: msbuild does this localized :(
This started off as a build time analysis, since @zanieb asked for numbers in #129907 (comment) and this is too painfull to do manually.
So I hacked the two new targets
BeginTimeStamp
andEndTimeStamp
for detailed analysis intopyproject.props
. This is only needed for the detailed timings per project. The overall wall times come for free, since msbuild and-m test --pgo
both print timings.analyse_build_times.py
is still of use here to create the tables.Numbers represent seconds. Please note that the sum of the detailed project times does not match the
total time
, because most of the projects are built in parallel, except_freeze_module
andpython314
. Therefore, I have listed them first in the details table and then sorted the others by build time of the first column.In the pgupdate details we still see
_freez_module
, because #130420 is not on that branch. Hence, we see what that PR saves us.I intentionally branched off commit 9db1a29 to have the same environment.
Debug build times:
Release build times:
PGO build times:
pginstr
phasepgo
as short forpgo task
in the table)kill
is due tocall :Kill
in case ofbuild.bat --pgo
- can be ignored, takes almost no timepgupd
phase takes longer for MSVCbuild.bat --pgo
times are longer for MSVCVery interesting:
pyexpat
_elementtree
take much longer for 20.1.0-rc2 in thepginstr
phase (see details), but come back to "normal" with--flto=thin
. Because these are so outliers, I retested several times with the same result :-O-flto=thin:
Since I now have the timing infrastructure, I wanted to try
-flto=thin
, too. Unsurprisingly, build times are faster.Performance seems neutral:
Detailed pybenchmark results
Maybe this is the reason why it is used in makefile based clang builds, too?
CONFIGURE_CFLAGS_NODIST
andCONFIGURE_LDFLAGS_NOLTO
both use-flto=thin
when I configure for clang in WSL Ubuntu-24.04.What to do with it?
Most probably a skip news. Don't even know, whether we want anything of that branch. Maybe just
-flto=thin
?I'd like others to verify / experiment.
Detailed build times:
Detailed debug build times
Detailed release build times
Details pginstrument build times
Details pgupdate build times