MDEV-36184 - To optimise dot_product in Power9 and Power10 architecture #3850

mikejuliet13 · 2025-02-21T17:54:39Z

This patch optimises the dot_product function by leveraging vectorisation through SIMD intrinsics. Specifically, the function now uses __builtin_vec_vupkhsh and __builtin_vec_vupklsh to efficiently convert input values from lower to higher data types.
This transformation enables parallel execution of multiple operations, significantly improving the performance of dot product computation on supported architectures.
Performance Analysis:
The original dot_product function does undergo auto-vectorisation when compiled with -O3. However, performance analysis has shown that the newly optimised implementation performs better on Power10 and achieves comparable performance on Power9 machines.

Output Changes:
The logical output of the dot_product function remains unchanged (i.e., it still computes the correct dot product).
With this patch, computations utilise vector registers, leading to improved performance. These optimisations are internal and do not alter any user-visible behaviour.

Potential Side Effects:

This patch introduces architecture-specific optimisations targeted at Power9 and Power10 systems.
The function has been extensively tested on both Power9 and Power10, where it demonstrates correctness and performance improvements.
If executed on an older architecture (e.g., Power8 or below) that lacks support for these vector instructions, the implementation automatically falls back to DEFAULT_IMPLEMENTATION, ensuring broader compatibility.

Release Notes:

Optimised the dot_product function using SIMD vectorisation for improved performance.
Introduces architecture-specific optimisations for Power9 and Power10 systems.
No changes to observable output; improvements are purely in internal computation efficiency.

Signed-off-by: Manjul Mohan <[email protected]>

CLAassistant · 2025-02-21T17:54:47Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

svoj

Looks good, thanks! Although needs some polishing.

sql/vector_mhnsw.cc

vuvova · 2025-02-21T22:45:24Z

The original dot_product function does undergo auto-vectorisation when compiled with -O3. However, performance analysis has shown that the newly optimised implementation performs better on Power10 and achieves comparable performance on Power9 machines.

Do you have any numbers? How does your implementation compare with auto-vectorization? Did you benchmark it? On Power 10 and Power 9? Auto-vectorization at -O3 by what compiler version?

mikejuliet13 · 2025-02-24T13:54:12Z

The original dot_product function does undergo auto-vectorisation when compiled with -O3. However, performance analysis has shown that the newly optimised implementation performs better on Power10 and achieves comparable performance on Power9 machines.

Do you have any numbers? How does your implementation compare with auto-vectorization? Did you benchmark it? On Power 10 and Power 9? Auto-vectorization at -O3 by what compiler version?

I conducted benchmark tests on both Power9 and Power10 machines, comparing the time taken by the original (auto-vectorized) code and the new vectorised code. I used GCC 11.5.0 on RHEL 9.5 operating system with -O3. The benchmarks were performed using a sample test code with a vector size of 4096 and 10⁷ loop iterations.
Here are the average execution times (in seconds) over multiple runs:
Power9:

Before change: ~16.364 s
After change: ~16.180 s
Performance gain is modest but measurable.

Power10:

Before change: ~8.989 s
After change: ~6.446 s
Significant improvement, roughly 28–30% faster.

The final results of the dot product remained the same before and after the change, confirming functional correctness.

Removed space before '=' Removed POWER_IMPLEMENTATION macro from before function definition Using int64_t and vector long long for handling dataoverflow Removed code for // Process remaining elements Signed-off-by: Manjul Mohan <[email protected]>

svoj

Nothing else on my mind, just some minor tweaks.
We will have to clean-up commit history (such that there is just one commit), but we should be able to do it on our side.

svoj · 2025-02-28T09:21:01Z

sql/vector_mhnsw.cc

+
+  static FVector *align_ptr(void *ptr)
+  {
+    return (FVector *)(MY_ALIGN(((intptr)ptr) + alloc_header, POWER_bytes) - alloc_header);


This line should be under 80 characters.

svoj · 2025-02-28T09:23:12Z

sql/vector_mhnsw.cc

+    }
+
+    // Sum the accumulated vector long long values into a scalar int64_t sum
+    sum+= static_cast<int64_t>(ll_sum[0]) + static_cast<int64_t>(ll_sum[1]);


With this code it feels like sum is redundant. At least sum+= definitely is.

svoj · 2025-02-28T10:25:04Z

sql/vector_mhnsw.cc

+  {
+    int64_t sum= 0;
+    vector long long ll_sum= {0, 0}; // Using vector long long for int64_t accumulation
+    size_t base= ((len + POWER_dims - 1) / POWER_dims) * POWER_dims; // Round up to process full vector, including padding


These lines should be under 80 characters, e.g. move comments up front.

svoj · 2025-02-28T10:26:11Z

sql/vector_mhnsw.cc

+    vector long long ll_sum= {0, 0}; // Using vector long long for int64_t accumulation
+    size_t base= ((len + POWER_dims - 1) / POWER_dims) * POWER_dims; // Round up to process full vector, including padding
+
+    for (size_t i= 0; i < base; i+= 8)


Should be i+= POWER_dims.

svoj · 2025-02-28T10:31:44Z

sql/vector_mhnsw.cc

+
+      // Vectorized multiplication
+      vector int product_hi= x_hi * y_hi;
+      vector int product_lo= x_lo * y_lo;


Can't we make use of vec_mule() / vec_mulo() here? They seem to perform widening multiply.
There seem to be nothing for widening add indeed.

Would it make sense to replace builtins with vec_unpackh / vec_unpackl at least? The mix looks really disturbing.

To optimise dot_product in Power9 and Power10

8608eda

Signed-off-by: Manjul Mohan <[email protected]>

svoj requested changes Feb 21, 2025

View reviewed changes

sql/vector_mhnsw.cc Outdated Show resolved Hide resolved

sql/vector_mhnsw.cc Outdated Show resolved Hide resolved

sql/vector_mhnsw.cc Outdated Show resolved Hide resolved

sql/vector_mhnsw.cc Outdated Show resolved Hide resolved

sql/vector_mhnsw.cc Outdated Show resolved Hide resolved

svoj added the External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements. label Feb 21, 2025

svoj changed the title ~~To optimise dot_product in Power9 and Power10 architecture~~ MDEV-36184 - To optimise dot_product in Power9 and Power10 architecture Feb 26, 2025

mikejuliet13 and others added 2 commits February 27, 2025 18:19

Merge branch 'MariaDB:main' into optimisation_final

c0815cb

Response to comments on PR

648f591

Removed space before '=' Removed POWER_IMPLEMENTATION macro from before function definition Using int64_t and vector long long for handling dataoverflow Removed code for // Process remaining elements Signed-off-by: Manjul Mohan <[email protected]>

svoj requested changes Feb 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MDEV-36184 - To optimise dot_product in Power9 and Power10 architecture #3850

MDEV-36184 - To optimise dot_product in Power9 and Power10 architecture #3850

mikejuliet13 commented Feb 21, 2025

CLAassistant commented Feb 21, 2025

svoj left a comment

vuvova commented Feb 21, 2025

mikejuliet13 commented Feb 24, 2025

svoj left a comment

svoj Feb 28, 2025

svoj Feb 28, 2025

svoj Feb 28, 2025

svoj Feb 28, 2025

svoj Feb 28, 2025

MDEV-36184 - To optimise dot_product in Power9 and Power10 architecture #3850

Are you sure you want to change the base?

MDEV-36184 - To optimise dot_product in Power9 and Power10 architecture #3850

Conversation

mikejuliet13 commented Feb 21, 2025

CLAassistant commented Feb 21, 2025

svoj left a comment

Choose a reason for hiding this comment

vuvova commented Feb 21, 2025

mikejuliet13 commented Feb 24, 2025

svoj left a comment

Choose a reason for hiding this comment

svoj Feb 28, 2025

Choose a reason for hiding this comment

svoj Feb 28, 2025

Choose a reason for hiding this comment

svoj Feb 28, 2025

Choose a reason for hiding this comment

svoj Feb 28, 2025

Choose a reason for hiding this comment

svoj Feb 28, 2025

Choose a reason for hiding this comment