Replies: 2 comments 2 replies
-
wmma has not benn maintained for a while. We are welcome the community to fix it. If you just want to measure the performance quickly, you can use nvprof/nsight to measure the kernel time of wmma unit tests (https://github.com/NVIDIA/cutlass/tree/master/test/unit/gemm/device). Or you can modify the wmma unit test to use cudaEvent to measure the runtime of multiple iterations to get more accurate number. |
Beta Was this translation helpful? Give feedback.
-
Is this fixed already? I cannot seem to reproduce the issue. Things I've done:
Everything works, except for a few incorrect results near the end, is that the issue? Or perhaps it only happens with SM75, and I've only have SM70 machines (Volta)? |
Beta Was this translation helpful? Give feedback.
-
I was trying to build the culass_profiler to see how GEMM kernels using the WMMA API perform. I removed the comment on this line. Somehow the build fails. Any Idea on what is going on? The issue can be reproduced by simply removing the comment on the line mentioned and then build CUTLASS.
Beta Was this translation helpful? Give feedback.
All reactions