Introduction

CUDA Reduce

Introduction

This project is for personal practice on Reduce operation in CUDA language. The algorithms implemented are mainly referenced to cuda-samples of the reduction example without multiple-block cooperative groups feature. The implementations here finish the last mile including:

synchronize across all blocks after partial results are derived in each block
reduce up the partial results to derive the final result in one block instead of doing it on CPU as the cuda-samples does

With the last mile delivery it might impact the profiling results. Thus, an option final_reduce is provided to control the operation on partial results. Please refer to the below description for the command line usages.

Prerequisites

CUDA Toolkit >= 11
Environment variable:
- CUDACXX: nvcc binary path to identify the CUDA Toolkit environment. The default value configured in project is /usr/local/cuda/bin/nvcc
gcc >= 10

Building

The project uses CMake for the build automation. With the following commands, anyone could have the project built and get the binary src/do_reduce in building directory.

mkdir -p build && cd build
cmake .. -GNinja
ninja

CLI

do_reduce --kernel=<int>    # Which kernel to be used. Supports 0-8
    --threads=<int>         # Block dimension, i.e. number of threads in a block
    --max_blocks=<int>      # Maximum number of blocks to be used
    --n=<int>               # The number of elements to be reduced up
    --final_reduce=<int>    # Whether to deliver the last mile. **0**, 1.

Due to the limitations in CUDA architecture, we cannot specify any value for some options. For example, the maximum number of x-axis threads in a block is 1024. Thus, --threads=1025 is illegal.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
externals		externals
prof/reduce		prof/reduce
src		src
tests		tests
.clang-format		.clang-format
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CUDA Reduce

Introduction

Prerequisites

Building

CLI

About

Releases

Packages

Languages

xziya/cubila

Folders and files

Latest commit

History

Repository files navigation

CUDA Reduce

Introduction

Prerequisites

Building

CLI

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages