-
I have implemented a custom Vicsek-style alignment force using
I was wondering whether you could share experiences or recommendations regarding accelerating custom force computations like this. I know that building an external component (plugin) via C++ / CUDA is possible, having also checked out the example pair plugin and the pair plugin collection by @ianrgraham. However, access to particle properties (velocities) seems not to be trivial in these pair examples. Is there a way? Is there an easier way using CuPy kernels or Numba CUDA kernels that would yield similar performance? Thank you for your help. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 8 replies
-
Are you using this with |
Beta Was this translation helpful? Give feedback.
-
Thank you @joaander! I am going to look into implementing it directly in C++/CUDA, I am using a similar approach currently which hopefully works out: I wrote a CuPy CUDA kernel which gave some speedup, but with NVIDIA Nsys Profile I can see that a lot of data (the neighbor list) is still being transferred between device and host. This data should remain on the device. Is there a way to access the memory address of the The code looks like this now:
|
Beta Was this translation helpful? Give feedback.
local_pair_list
is designed for convenience, not zero-copy. You should usegpu_local_nlist_arrays
with cupy.I still recommend a direct C++ implementation. This opens the possibility to use highly optimized multiple threads per particle code, auto uners, parameter dictionaries, etc. If you are writing C++ already for cupy, then the only additional work is to add the C++/Python interface and to execute
cmake
to configure and build the code.