Skip to content

Commit c36483f

Browse files
committed
Docs: Replace terms.md page with page that provides example of API mapping
1 parent ec18f61 commit c36483f

File tree

5 files changed

+180
-80
lines changed

5 files changed

+180
-80
lines changed

docs/faq.rst

+2-33
Original file line numberDiff line numberDiff line change
@@ -65,39 +65,8 @@ platforms.
6565
Additional porting might be required to deal with architecture feature
6666
queries or CUDA capabilities that HIP doesn't support.
6767

68-
How does HIP compare with OpenCL?
69-
---------------------------------
70-
71-
HIP offers several benefits over OpenCL:
72-
73-
* Device code can be written in modern C++, including templates, lambdas,
74-
classes and so on.
75-
* Host and device code can be mixed in the source files.
76-
* The HIP API is less verbose than OpenCL and is familiar to CUDA developers.
77-
* Porting from CUDA to HIP is significantly easier than from CUDA to OpenCL.
78-
* HIP uses development tools specialized for each platform: :doc:`amdclang++ <llvm-project:index>`
79-
for AMD GPUs or `nvcc <https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html>`_
80-
for NVIDIA GPUs, and profilers like :doc:`ROCm Compute Profiler <rocprofiler-compute:index>` or
81-
`Nsight Systems <https://developer.nvidia.com/nsight-systems>`_.
82-
* HIP provides
83-
* pointers and host-side pointer arithmetic.
84-
* device-level control over memory allocation and placement.
85-
* an offline compilation model.
86-
87-
How does porting CUDA to HIP compare to porting CUDA to OpenCL?
88-
---------------------------------------------------------------
89-
90-
OpenCL differs from HIP and CUDA when considering the host runtime,
91-
but even more so when considering the kernel code.
92-
The HIP device code is a C++ dialect, while OpenCL is C99-based.
93-
OpenCL does not support single-source compilation.
94-
95-
As a result, the OpenCL syntax differs significantly from HIP, and porting tools
96-
must perform complex transformations, especially regarding templates or other
97-
C++ features in kernels.
98-
99-
To better understand the syntax differences, see :doc:`here<reference/terms>` or
100-
the :doc:`HIP porting guide <how-to/hip_porting_guide>`.
68+
To better understand the syntax differences, see :doc:`CUDA to HIP API Function Comparison <reference/api_syntax>`
69+
or the :doc:`HIP porting guide <how-to/hip_porting_guide>`.
10170

10271
Can I install CUDA and ROCm on the same machine?
10372
------------------------------------------------

docs/index.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ The HIP documentation is organized into the following categories:
4545
* [HSA runtime API for ROCm](./reference/virtual_rocr)
4646
* [HIP math API](./reference/math_api)
4747
* [HIP environment variables](./reference/env_variables)
48-
* [Comparing syntax for different APIs](./reference/terms)
48+
* [CUDA to HIP API Function Comparison](./reference/api_syntax)
4949
* [List of deprecated APIs](./reference/deprecated_api_list)
5050
* [FP8 numbers in HIP](./reference/fp8_numbers)
5151
* {doc}`./reference/hardware_features`

docs/reference/api_syntax.rst

+176
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,176 @@
1+
.. meta::
2+
:description: Maps CUDA API syntax to HIP API syntax with an example
3+
:keywords: AMD, ROCm, HIP, CUDA, syntax, HIP syntax
4+
5+
********************************************************************************
6+
CUDA to HIP API Function Comparison
7+
********************************************************************************
8+
9+
This page introduces key syntax differences between CUDA and HIP APIs with a focused code
10+
example and comparison table. For a complete list of mappings, visit :ref:`HIPIFY <HIPIFY:index>`.
11+
12+
The following CUDA code example illustrates several CUDA API syntaxes.
13+
14+
.. code-block:: cpp
15+
16+
#include <iostream>
17+
#include <vector>
18+
#include <cuda_runtime.h>
19+
20+
__global__ void block_reduction(const float* input, float* output, int num_elements)
21+
{
22+
extern __shared__ float s_data[];
23+
24+
int tid = threadIdx.x;
25+
int global_id = blockDim.x * blockIdx.x + tid;
26+
27+
if (global_id < num_elements)
28+
{
29+
s_data[tid] = input[global_id];
30+
}
31+
else
32+
{
33+
s_data[tid] = 0.0f;
34+
}
35+
__syncthreads();
36+
37+
for (int stride = blockDim.x / 2; stride > 0; stride >>= 1)
38+
{
39+
if (tid < stride)
40+
{
41+
s_data[tid] += s_data[tid + stride];
42+
}
43+
__syncthreads();
44+
}
45+
46+
if (tid == 0)
47+
{
48+
output[blockIdx.x] = s_data[0];
49+
}
50+
}
51+
52+
int main()
53+
{
54+
int threads = 256;
55+
const int num_elements = 50000;
56+
57+
std::vector<float> h_a(num_elements);
58+
std::vector<float> h_b((num_elements + threads - 1) / threads);
59+
60+
for (int i = 0; i < num_elements; ++i)
61+
{
62+
h_a[i] = rand() / static_cast<float>(RAND_MAX);
63+
}
64+
65+
float *d_a, *d_b;
66+
cudaMalloc(&d_a, h_a.size() * sizeof(float));
67+
cudaMalloc(&d_b, h_b.size() * sizeof(float));
68+
69+
cudaStream_t stream;
70+
cudaStreamCreateWithFlags(&stream, cudaStreamNonBlocking);
71+
72+
cudaEvent_t start_event, stop_event;
73+
cudaEventCreate(&start_event);
74+
cudaEventCreate(&stop_event);
75+
76+
cudaMemcpyAsync(d_a, h_a.data(), h_a.size() * sizeof(float), cudaMemcpyHostToDevice, stream);
77+
78+
cudaEventRecord(start_event, stream);
79+
80+
int blocks = (num_elements + threads - 1) / threads;
81+
block_reduction<<<blocks, threads, threads * sizeof(float), stream>>>(d_a, d_b, num_elements);
82+
83+
cudaMemcpyAsync(h_b.data(), d_b, h_b.size() * sizeof(float), cudaMemcpyDeviceToHost, stream);
84+
85+
cudaEventRecord(stop_event, stream);
86+
cudaEventSynchronize(stop_event);
87+
88+
cudaEventElapsedTime(&milliseconds, start_event, stop_event);
89+
std::cout << "Kernel execution time: " << milliseconds << " ms\n";
90+
91+
cudaFree(d_a);
92+
cudaFree(d_b);
93+
94+
cudaEventDestroy(start_event);
95+
cudaEventDestroy(stop_event);
96+
cudaStreamDestroy(stream);
97+
98+
return 0;
99+
}
100+
101+
The following table maps CUDA API functions to corresponding HIP API functions, as demonstrated in the
102+
preceding code examples.
103+
104+
.. list-table::
105+
:header-rows: 1
106+
:name: syntax-mapping-table
107+
108+
*
109+
- CUDA
110+
- HIP
111+
112+
*
113+
- ``#include <cuda_runtime.h>``
114+
- ``#include <hip/hip_runtime.h>``
115+
116+
*
117+
- ``cudaError_t``
118+
- ``hipError_t``
119+
120+
*
121+
- ``cudaEvent_t``
122+
- ``hipEvent_t``
123+
124+
*
125+
- ``cudaStream_t``
126+
- ``hipStream_t``
127+
128+
*
129+
- ``cudaMalloc``
130+
- ``hipMalloc``
131+
132+
*
133+
- ``cudaStreamCreateWithFlags``
134+
- ``hipStreamCreateWithFlags``
135+
136+
*
137+
- ``cudaStreamNonBlocking``
138+
- ``hipStreamNonBlocking``
139+
140+
*
141+
- ``cudaEventCreate``
142+
- ``hipEventCreate``
143+
144+
*
145+
- ``cudaMemcpyAsync``
146+
- ``hipMemcpyAsync``
147+
148+
*
149+
- ``cudaMemcpyHostToDevice``
150+
- ``hipMemcpyHostToDevice``
151+
152+
*
153+
- ``cudaEventRecord``
154+
- ``hipEventRecord``
155+
156+
*
157+
- ``cudaEventSynchronize``
158+
- ``hipEventSynchronize``
159+
160+
*
161+
- ``cudaEventElapsedTime``
162+
- ``hipEventElapsedTime``
163+
164+
*
165+
- ``cudaFree``
166+
- ``hipFree``
167+
168+
*
169+
- ``cudaEventDestroy``
170+
- ``hipEventDestroy``
171+
172+
*
173+
- ``cudaStreamDestroy``
174+
- ``hipStreamDestroy``
175+
176+
In summary, this comparison highlights the primary differences between CUDA and HIP APIs.

docs/reference/terms.md

-44
This file was deleted.

docs/sphinx/_toc.yml.in

+1-2
Original file line numberDiff line numberDiff line change
@@ -111,8 +111,7 @@ subtrees:
111111
- file: reference/virtual_rocr
112112
- file: reference/math_api
113113
- file: reference/env_variables
114-
- file: reference/terms
115-
title: Comparing syntax for different APIs
114+
- file: reference/api_syntax
116115
- file: reference/deprecated_api_list
117116
title: List of deprecated APIs
118117
- file: reference/fp8_numbers

0 commit comments

Comments
 (0)