Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incorrect migration of __shfl_xor_sync CUDA API within a template function #2189

Open
ArberSephirotheca opened this issue Jul 22, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@ArberSephirotheca
Copy link

ArberSephirotheca commented Jul 22, 2024

Describe the bug

When I tried to migrate a template function with just 1 loc in the function body.

#include <cuda_runtime.h>

template <typename KeyT, typename ValueT, uint32_t WORKGROUP_SIZE>
class WarpSort{

    __device__ static void swap(KeyT& key, ValueT& value,
                                 uint32_t const& step, uint32_t const& activemask,
                                 bool bDescending, std::true_type const& isKeyOnly)
    {
        __shfl_xor_sync(activemask, key, step, 32);

    }
}

I expect the __shfl_xor_sync would be migrated into dpct::permute_sub_group_by_xor. However, SYCLomatic does nothing to the code.
after migration:

#include <sycl/sycl.hpp>
#include <dpct/dpct.hpp>

template <typename KeyT, typename ValueT, uint32_t WORKGROUP_SIZE>
class WarpSort{

    static void swap(KeyT& key, ValueT& value,
                                 uint32_t const& step, uint32_t const& activemask,
                                 bool bDescending, std::true_type const& isKeyOnly)
    {
        __shfl_xor_sync(activemask, key, step, 32);

    }

To reproduce

#include <cuda_runtime.h>

template <typename KeyT, typename ValueT, uint32_t WORKGROUP_SIZE>
class WarpSort{

    __device__ static void swap(KeyT& key, ValueT& value,
                                 uint32_t const& step, uint32_t const& activemask,
                                 bool bDescending, std::true_type const& isKeyOnly)
    {
        __shfl_xor_sync(activemask, key, step, 32);

    }
}

run the above code with dpct

Environment

  • OS: Linux
  • Target device and vendor: Nvidia GPU
  • DPC++ version:Intel(R) oneAPI DPC++/C++ Compiler 2024.2.0 (2024.2.0.20240602)

Additional context

One interesting observation is when you change the type of key into explicit type name such as int& key, the migration success.
before migration:

#include <cuda_runtime.h>

template <typename KeyT, typename ValueT, uint32_t WORKGROUP_SIZE>
class WarpSort{

    __device__ static void swap(int& key, ValueT& value,
                                 uint32_t const& step, uint32_t const& activemask,
                                 bool bDescending, std::true_type const& isKeyOnly)
    {
        __shfl_xor_sync(activemask, key, step, 32);

    }
}

after migration:

template <typename KeyT, typename ValueT, uint32_t WORKGROUP_SIZE>
class WarpSort{

    static void swap(int& key, ValueT& value,
                                 uint32_t const& step, uint32_t const& activemask,
                                 bool bDescending, std::true_type const& isKeyOnly,
                                 const sycl::nd_item<3> &item_ct1)
    {
        /*
        DPCT1023:0: The SYCL sub-group does not support mask options for
        dpct::permute_sub_group_by_xor. You can specify
        "--use-experimental-features=masked-sub-group-operation" to use the
        experimental helper function to migrate __shfl_xor_sync.
        */
        dpct::permute_sub_group_by_xor(item_ct1.get_sub_group(), key, step);
    }
@ArberSephirotheca ArberSephirotheca added the bug Something isn't working label Jul 22, 2024
@tomflinda
Copy link
Contributor

tomflinda commented Jul 23, 2024

Hi @ArberSephirotheca,
The root cause is that when the type of key is not specified, the call type for __shfl_xor_sync is UnresolvedLookupExpr, shown as :

    | | `-CallExpr <line:10:9, col:50> '<dependent type>'
    | |   |-UnresolvedLookupExpr <col:9> '<overloaded function type>' lvalue (ADL) = '__shfl_xor_sync' 0xe8aacf0 0xe8aa638 0xe8a9960 0xe8a92b8 0xe8a76a0 0xe8a7008 0xe8a6628 0xe8a5c58
    | |   |-DeclRefExpr <col:25> 'const uint32_t':'const unsigned int' lvalue ParmVar 0xe93f3d8 'activemask' 'const uint32_t &'
    | |   |-DeclRefExpr <col:37> 'KeyT' lvalue ParmVar 0xe93f1c0 'key' 'KeyT &'
    | |   |-DeclRefExpr <col:42> 'const uint32_t':'const unsigned int' lvalue ParmVar 0xe93f358 'step' 'const uint32_t &'
    | |   `-IntegerLiteral <col:48> 'int' 32

While, when the type of key is specified, the call type for __shfl_xor_sync is initialized, shown as follows:

      | `-CallExpr <line:17:9, col:50> 'int'
      |   |-ImplicitCastExpr <col:9> 'int (*)(unsigned int, int, int, int)' <FunctionToPointerDecay>
      |   | `-DeclRefExpr <col:9> 'int (unsigned int, int, int, int)' lvalue Function 0xe8a5c58 '__shfl_xor_sync' 'int (unsigned int, int, int, int)'
      |   |-ImplicitCastExpr <col:25> 'uint32_t':'unsigned int' <LValueToRValue>
      |   | `-DeclRefExpr <col:25> 'const uint32_t':'const unsigned int' lvalue ParmVar 0xe93fa10 'activemask' 'const uint32_t &'
      |   |-ImplicitCastExpr <col:37> 'int' <LValueToRValue>
      |   | `-DeclRefExpr <col:37> 'int' lvalue ParmVar 0xe93f898 'key' 'int &'
      |   |-ImplicitCastExpr <col:42> 'int' <IntegralCast>
      |   | `-ImplicitCastExpr <col:42> 'uint32_t':'unsigned int' <LValueToRValue>
      |   |   `-DeclRefExpr <col:42> 'const uint32_t':'const unsigned int' lvalue ParmVar 0xe93f990 'step' 'const uint32_t &'
      |   `-IntegerLiteral <col:48> 'int' 32

For more detail, pls refer to https://godbolt.org/z/MeaYs9afv
After all, for this issue, we will plan to fix it, thanks for your report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants