Rearranged struct fields to prevent ldp page crossings #78

dorukkarademirler · 2025-01-28T18:51:58Z

Explanation of Structural Modifications

To enhance performance and reduce data cache pressure, the following structural modifications have been implemented:

Reordering Elements: The structure has been modified to prevent LDP instructions from crossing a 4K page boundary.

Poly Array: Placed at the beginning of the structure.
Invln2_Scaled: Positioned immediately after the poly array with a 16-byte alignment to ensure proper alignment.

Alignment Adjustments:

The entire structure is now aligned to 64 bytes to further prevent page crossings.
The tab variable has been moved by 256 bytes. This adjustment aligns the entire structure with a relatively small number, effectively fixing the page crossing error while minimizing wasted bytes in the worst-case scenario.

These changes collectively contribute to improved performance and reduced data cache pressure.

…p instructions from crossing page boundaries

dorukkarademirler · 2025-01-28T19:10:47Z

For any issues or further communication related to this repository, please use my open source development email at Qualcomm: [email protected].

joeramsay · 2025-01-29T10:19:47Z

Thanks for your interest in contributing! Please could you provide some details of measured speedup, with your architecture and compiler? In case you don't know, you can use the mathbench binary to get microbench numbers.

Is there some way of achieving what you want without aligning invln2_scaled by 16? I see a small (2-3%) performance regression on Neoverse V1 with GCC 14 from this patch, I think because the alignment prevents LDP fusion with the last element of poly.

To merge this we need a signed contribution agreement, so that we can update GLIBC under our FSF copyright assignment - when the PR is ready to merge please could you fill out https://github.com/ARM-software/optimized-routines/blob/master/contributor-agreement.pdf and email it to [email protected]? Printed/scanned is fine

dorukkarademirler · 2025-01-29T20:26:38Z

This issue was fixed on Qualcomm's Android build, arm64 architecture. The main issue was that after updating to LLVM 18, the LDP statements were crossing the page boundary with the original structure. These changes help improve performance and reduce data cache pressure. Rather than a speedup, these modifications are aimed at preventing anomalies and significant performance loss. I am attaching an image of the performance loss observed.

Looking at the Geekbench results, Libm.so's CPU usage was approximately 3% without page crossings. However, with page crossings, it increased to around 11%.

simpleperf record -e cpu-cycles results:
LLVM17: no page crossings.

LLVM18: second ldp crosses the page.

Performance Comparison

Regarding the small (2-3%) performance regression on Neoverse V1 with GCC 14, I committed a version where there isn't any alignment for invln2_scaled. You can check that version as well.

As for the contribution agreement, Qualcomm might already have an agreement with ARM. If I need to do this individually as well, I will send it.

Rearranged struct fields and added alignment attributes to prevent ld…

8fb6cbf

…p instructions from crossing page boundaries

Remove whitespaces

4f857bd

Removed the invln2_scaled alignment and restructured

71daa87

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rearranged struct fields to prevent ldp page crossings #78

Rearranged struct fields to prevent ldp page crossings #78

dorukkarademirler commented Jan 28, 2025

dorukkarademirler commented Jan 28, 2025

joeramsay commented Jan 29, 2025

dorukkarademirler commented Jan 29, 2025 •

edited

Loading

Rearranged struct fields to prevent ldp page crossings #78

Are you sure you want to change the base?

Rearranged struct fields to prevent ldp page crossings #78

Conversation

dorukkarademirler commented Jan 28, 2025

dorukkarademirler commented Jan 28, 2025

joeramsay commented Jan 29, 2025

dorukkarademirler commented Jan 29, 2025 • edited Loading

dorukkarademirler commented Jan 29, 2025 •

edited

Loading