Skip to content

xziya/cubila

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CUDA Reduce

Introduction

This project is for personal practice on Reduce operation in CUDA language. The algorithms implemented are mainly referenced to cuda-samples of the reduction example without multiple-block cooperative groups feature. The implementations here finish the last mile including:

  1. synchronize across all blocks after partial results are derived in each block
  2. reduce up the partial results to derive the final result in one block instead of doing it on CPU as the cuda-samples does

With the last mile delivery it might impact the profiling results. Thus, an option final_reduce is provided to control the operation on partial results. Please refer to the below description for the command line usages.

Prerequisites

  • CUDA Toolkit >= 11
  • Environment variable:
    • CUDACXX: nvcc binary path to identify the CUDA Toolkit environment. The default value configured in project is /usr/local/cuda/bin/nvcc
  • gcc >= 10

Building

The project uses CMake for the build automation. With the following commands, anyone could have the project built and get the binary src/do_reduce in building directory.

mkdir -p build && cd build
cmake .. -GNinja
ninja

CLI

do_reduce --kernel=<int>    # Which kernel to be used. Supports 0-8
    --threads=<int>         # Block dimension, i.e. number of threads in a block
    --max_blocks=<int>      # Maximum number of blocks to be used
    --n=<int>               # The number of elements to be reduced up
    --final_reduce=<int>    # Whether to deliver the last mile. **0**, 1.

Due to the limitations in CUDA architecture, we cannot specify any value for some options. For example, the maximum number of x-axis threads in a block is 1024. Thus, --threads=1025 is illegal.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published