Skip to content

Commit 23f2f07

Browse files
takahiroharadaKaoCCjammmmehmetoguzderin
authoredMay 24, 2022
Feature/oro 0 radix sort (#19)
* [ORO-0] Working 8 bit radix sort. * [ORO-0] Some optimization. * Create LICENSE * Update README.md (#15) * Feature/oro 0 raw get set (#19) * [ORO-0] Rename setter and getter. * [ORO-0] Fix when there is a dll but no device. * [ORO-0] Deletion function. * [ORO-0] Multi processor count. * [ORO-0] Extended the sort to more than 8 bits. Implemented tests. * [ORO-0] Moved temp buffer allocation out from the sort(). * [ORO-0] README. References. * [ORO-0] Debug flag. * Refactor the code to add the basic constructs to support selecting different scan algorithms. Add different implementation of the scan algorithm: CPU, single WG and all WG . Signed-off-by: Chih-Chen Kao <[email protected]> * Squashed commit of the following: commit 3f32bea2244653d59efb3c3eaa9433018dde5835 Author: takahiroharada <[email protected]> Date: Wed Apr 13 10:48:35 2022 -0700 [ORO-0] Fix nvrtc. * Optimization: Implement the single-pass kernel for GPU parallel scan. Fix a GPU memory bug. Signed-off-by: Chih-Chen Kao <[email protected]> * Feature/oro 0 kernel cache (#4) * [ORO-0] Cache kernel. * [ORO-0] Support newer HIP builds on windows (#22) * [ORO-0] Unit test. (#23) * Fix LDS scan bug. The previous implementation would lead to an error when the wavefront (wrap) size is not equal to the size of a workgroup (block). Since not all threads run simultaneously, for an input arrays larger than the wavefront size, the previous algorithm will not work because it performs the scan in-place on the input array. The results of one wavefront (wrap) will be overwritten by work items (threads) in another wavefront (wrap). Signed-off-by: Chih-Chen Kao <[email protected]> * Optimize the LDS scan algorithm. (#6) * Optimize the LDS scan algorithm. This version does not require a temp buffer and can support a LDS input size up to 2 times the workgroup size. Signed-off-by: Chih-Chen Kao <[email protected]> * Support an input array in LDS that is 2 times the WG size. Signed-off-by: Chih-Chen Kao <[email protected]> * Feature/oro 0 clean up (#7) * Squashed commit of the following: commit 3f32bea2244653d59efb3c3eaa9433018dde5835 Author: takahiroharada <[email protected]> Date: Wed Apr 13 10:48:35 2022 -0700 [ORO-0] Fix nvrtc. * [ORO-0] Clean up. * Feature/oro 0 clean up (#10) * Squashed commit of the following: commit 3f32bea2244653d59efb3c3eaa9433018dde5835 Author: takahiroharada <[email protected]> Date: Wed Apr 13 10:48:35 2022 -0700 [ORO-0] Fix nvrtc. * [ORO-0] Clean up. * [ORO-0] SortKernel1. Less complex. (#8) SortKernel (occupancy: 8) - vgpr: 128 - lds: 6704 SortKernel1 (occupancy: 9) - vgpr: 106 - lds 7720 * [ORO-0] Kernel execution time check. * Fix the memory access pattern and change it to coalesced memory access. (#11) Signed-off-by: Chih-Chen Kao <[email protected]> * [ORO-0] Single kernel sort for small keys. (#12) * Optimize the Count kernel for less LDS usage to achieve full occupancy (#13) * Optimize the Count kernel to let it use less LDS and could achieve full occupancy. Signed-off-by: Chih-Chen Kao <[email protected]> * Remove __threadfence_block() Removes the boundary check in the inner loop. The upper bound is set only once before going into the loop. Signed-off-by: Chih-Chen Kao <[email protected]> * Introduce DRIVER and RTC APIs * Disable enum-variant * Improve paths * Add fields * Update Vulkan test * Define CUDA in terms of DRIVER and RTC * Optimize the sort kernel: single-pass 8bit sort & parallel scan in 4bit sort. (#14) * Fix a minor issue in CountKernel to make it more robust. Implement a single-pass 8-bit local sort. Implement a single-pass 8-bit local sort with shared bins. Signed-off-by: Chih-Chen Kao <[email protected]> * Fix nItemsPerWI and enable the version with shared LDS. Signed-off-by: Chih-Chen Kao <[email protected]> * [ORO-0] Print driver version. * [ORO-0] Repro case. * Fix SORT_WG_SIZE. Fix stable sort order. Signed-off-by: Chih-Chen Kao <[email protected]> * Optimize sort kernel to remove inner boundary check. Adjust nItemsPerWI. Signed-off-by: Chih-Chen Kao <[email protected]> Co-authored-by: takahiroharada <[email protected]> * Merging another merge (#18) * Fix a minor issue in CountKernel to make it more robust. Implement a single-pass 8-bit local sort. Implement a single-pass 8-bit local sort with shared bins. Signed-off-by: Chih-Chen Kao <[email protected]> * Fix nItemsPerWI and enable the version with shared LDS. Signed-off-by: Chih-Chen Kao <[email protected]> * [ORO-0] Print driver version. * [ORO-0] Repro case. * Fix SORT_WG_SIZE. Fix stable sort order. Signed-off-by: Chih-Chen Kao <[email protected]> * Optimize sort kernel to remove inner boundary check. Adjust nItemsPerWI. Signed-off-by: Chih-Chen Kao <[email protected]> * Calculate the number of WGs based on LDS and max-thread-per-WGP. (#15) * Calculate the number of WGs based on LDS and max-thread-per-WGP. Signed-off-by: Chih-Chen Kao <[email protected]> * Add a workaround for CUDA. Signed-off-by: Chih-Chen Kao <[email protected]> * Optimize the sort kernel: single-pass 8bit sort & parallel scan in 4bit sort. (#14) * Fix a minor issue in CountKernel to make it more robust. Implement a single-pass 8-bit local sort. Implement a single-pass 8-bit local sort with shared bins. Signed-off-by: Chih-Chen Kao <[email protected]> * Fix nItemsPerWI and enable the version with shared LDS. Signed-off-by: Chih-Chen Kao <[email protected]> * [ORO-0] Print driver version. * [ORO-0] Repro case. * Fix SORT_WG_SIZE. Fix stable sort order. Signed-off-by: Chih-Chen Kao <[email protected]> * Optimize sort kernel to remove inner boundary check. Adjust nItemsPerWI. Signed-off-by: Chih-Chen Kao <[email protected]> Co-authored-by: takahiroharada <[email protected]> Co-authored-by: takahiroharada <[email protected]> Co-authored-by: Chih-Chen Kao <[email protected]> * Implement key-value pair sorting (#17) * Add gitignore to the repository Signed-off-by: Chih-Chen Kao <[email protected]> * Fix missing CUDA properties. (#16) Signed-off-by: Chih-Chen Kao <[email protected]> * Add basic structure for key-value pair sorting. Fix an error in single pass sort Signed-off-by: Chih-Chen Kao <[email protected]> * Add Value data in the test and sort it according to keys. Signed-off-by: Chih-Chen Kao <[email protected]> * Support Key only sorting. Signed-off-by: Chih-Chen Kao <[email protected]> * [ORO-0] Make single pass kernel non compile time switch. * Support both Key-Only & Key-Value pair sort kernels Signed-off-by: Chih-Chen Kao <[email protected]> * [ORO-0] Test change. * [ORO-0] A bug. * [ORO-0] NVIDIA occupancy computation fix. Test change. Tweak params to use single pass sort as much as possible. Co-authored-by: Takahiro Harada <[email protected]> Co-authored-by: takahiroharada <[email protected]> * [ORO-0] Revert demo code. * Fix missing CUDA properties. (#26) * Update Orochi.cpp * [ORO-0] Clean up. * [ORO-0] OroUtils. (#27) * [ORO-0] OroUtils. * [ORO-0] Linux build fix. * [ORO-0] Forgot to add. * [ORO-0] Linux build fix. * [ORO-0] Clean up. Co-authored-by: Chih-Chen Kao <[email protected]> Co-authored-by: Aaryaman Vasishta <[email protected]> Co-authored-by: Mehmet Oguz Derin <[email protected]>
1 parent d9343bc commit 23f2f07

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

54 files changed

+33684
-944
lines changed
 

‎.gitignore

+23-4
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,24 @@
1-
*.vs
2-
*.vscode
1+
**/Makefile
2+
**/obj
3+
**/*.log
4+
/cache
5+
dist/**
6+
!dist/*sh
7+
!dist/resultReference
8+
9+
*.json
10+
*.filters
11+
*.sdf
12+
*.bc
13+
*.o
14+
15+
!Resources/**
16+
*.vcxproj
17+
*.user
18+
*.sln
19+
*.opendb
20+
*.def
21+
.DS_Store
22+
.vs/
323
build/
4-
dist/
5-
*.sln
24+

‎LICENSE

+21-202
Original file line numberDiff line numberDiff line change
@@ -1,202 +1,21 @@
1-
Apache License
2-
Version 2.0, January 2004
3-
http://www.apache.org/licenses/
4-
5-
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6-
7-
1. Definitions.
8-
9-
"License" shall mean the terms and conditions for use, reproduction,
10-
and distribution as defined by Sections 1 through 9 of this document.
11-
12-
"Licensor" shall mean the copyright owner or entity authorized by
13-
the copyright owner that is granting the License.
14-
15-
"Legal Entity" shall mean the union of the acting entity and all
16-
other entities that control, are controlled by, or are under common
17-
control with that entity. For the purposes of this definition,
18-
"control" means (i) the power, direct or indirect, to cause the
19-
direction or management of such entity, whether by contract or
20-
otherwise, or (ii) ownership of fifty percent (50%) or more of the
21-
outstanding shares, or (iii) beneficial ownership of such entity.
22-
23-
"You" (or "Your") shall mean an individual or Legal Entity
24-
exercising permissions granted by this License.
25-
26-
"Source" form shall mean the preferred form for making modifications,
27-
including but not limited to software source code, documentation
28-
source, and configuration files.
29-
30-
"Object" form shall mean any form resulting from mechanical
31-
transformation or translation of a Source form, including but
32-
not limited to compiled object code, generated documentation,
33-
and conversions to other media types.
34-
35-
"Work" shall mean the work of authorship, whether in Source or
36-
Object form, made available under the License, as indicated by a
37-
copyright notice that is included in or attached to the work
38-
(an example is provided in the Appendix below).
39-
40-
"Derivative Works" shall mean any work, whether in Source or Object
41-
form, that is based on (or derived from) the Work and for which the
42-
editorial revisions, annotations, elaborations, or other modifications
43-
represent, as a whole, an original work of authorship. For the purposes
44-
of this License, Derivative Works shall not include works that remain
45-
separable from, or merely link (or bind by name) to the interfaces of,
46-
the Work and Derivative Works thereof.
47-
48-
"Contribution" shall mean any work of authorship, including
49-
the original version of the Work and any modifications or additions
50-
to that Work or Derivative Works thereof, that is intentionally
51-
submitted to Licensor for inclusion in the Work by the copyright owner
52-
or by an individual or Legal Entity authorized to submit on behalf of
53-
the copyright owner. For the purposes of this definition, "submitted"
54-
means any form of electronic, verbal, or written communication sent
55-
to the Licensor or its representatives, including but not limited to
56-
communication on electronic mailing lists, source code control systems,
57-
and issue tracking systems that are managed by, or on behalf of, the
58-
Licensor for the purpose of discussing and improving the Work, but
59-
excluding communication that is conspicuously marked or otherwise
60-
designated in writing by the copyright owner as "Not a Contribution."
61-
62-
"Contributor" shall mean Licensor and any individual or Legal Entity
63-
on behalf of whom a Contribution has been received by Licensor and
64-
subsequently incorporated within the Work.
65-
66-
2. Grant of Copyright License. Subject to the terms and conditions of
67-
this License, each Contributor hereby grants to You a perpetual,
68-
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69-
copyright license to reproduce, prepare Derivative Works of,
70-
publicly display, publicly perform, sublicense, and distribute the
71-
Work and such Derivative Works in Source or Object form.
72-
73-
3. Grant of Patent License. Subject to the terms and conditions of
74-
this License, each Contributor hereby grants to You a perpetual,
75-
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76-
(except as stated in this section) patent license to make, have made,
77-
use, offer to sell, sell, import, and otherwise transfer the Work,
78-
where such license applies only to those patent claims licensable
79-
by such Contributor that are necessarily infringed by their
80-
Contribution(s) alone or by combination of their Contribution(s)
81-
with the Work to which such Contribution(s) was submitted. If You
82-
institute patent litigation against any entity (including a
83-
cross-claim or counterclaim in a lawsuit) alleging that the Work
84-
or a Contribution incorporated within the Work constitutes direct
85-
or contributory patent infringement, then any patent licenses
86-
granted to You under this License for that Work shall terminate
87-
as of the date such litigation is filed.
88-
89-
4. Redistribution. You may reproduce and distribute copies of the
90-
Work or Derivative Works thereof in any medium, with or without
91-
modifications, and in Source or Object form, provided that You
92-
meet the following conditions:
93-
94-
(a) You must give any other recipients of the Work or
95-
Derivative Works a copy of this License; and
96-
97-
(b) You must cause any modified files to carry prominent notices
98-
stating that You changed the files; and
99-
100-
(c) You must retain, in the Source form of any Derivative Works
101-
that You distribute, all copyright, patent, trademark, and
102-
attribution notices from the Source form of the Work,
103-
excluding those notices that do not pertain to any part of
104-
the Derivative Works; and
105-
106-
(d) If the Work includes a "NOTICE" text file as part of its
107-
distribution, then any Derivative Works that You distribute must
108-
include a readable copy of the attribution notices contained
109-
within such NOTICE file, excluding those notices that do not
110-
pertain to any part of the Derivative Works, in at least one
111-
of the following places: within a NOTICE text file distributed
112-
as part of the Derivative Works; within the Source form or
113-
documentation, if provided along with the Derivative Works; or,
114-
within a display generated by the Derivative Works, if and
115-
wherever such third-party notices normally appear. The contents
116-
of the NOTICE file are for informational purposes only and
117-
do not modify the License. You may add Your own attribution
118-
notices within Derivative Works that You distribute, alongside
119-
or as an addendum to the NOTICE text from the Work, provided
120-
that such additional attribution notices cannot be construed
121-
as modifying the License.
122-
123-
You may add Your own copyright statement to Your modifications and
124-
may provide additional or different license terms and conditions
125-
for use, reproduction, or distribution of Your modifications, or
126-
for any such Derivative Works as a whole, provided Your use,
127-
reproduction, and distribution of the Work otherwise complies with
128-
the conditions stated in this License.
129-
130-
5. Submission of Contributions. Unless You explicitly state otherwise,
131-
any Contribution intentionally submitted for inclusion in the Work
132-
by You to the Licensor shall be under the terms and conditions of
133-
this License, without any additional terms or conditions.
134-
Notwithstanding the above, nothing herein shall supersede or modify
135-
the terms of any separate license agreement you may have executed
136-
with Licensor regarding such Contributions.
137-
138-
6. Trademarks. This License does not grant permission to use the trade
139-
names, trademarks, service marks, or product names of the Licensor,
140-
except as required for reasonable and customary use in describing the
141-
origin of the Work and reproducing the content of the NOTICE file.
142-
143-
7. Disclaimer of Warranty. Unless required by applicable law or
144-
agreed to in writing, Licensor provides the Work (and each
145-
Contributor provides its Contributions) on an "AS IS" BASIS,
146-
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147-
implied, including, without limitation, any warranties or conditions
148-
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149-
PARTICULAR PURPOSE. You are solely responsible for determining the
150-
appropriateness of using or redistributing the Work and assume any
151-
risks associated with Your exercise of permissions under this License.
152-
153-
8. Limitation of Liability. In no event and under no legal theory,
154-
whether in tort (including negligence), contract, or otherwise,
155-
unless required by applicable law (such as deliberate and grossly
156-
negligent acts) or agreed to in writing, shall any Contributor be
157-
liable to You for damages, including any direct, indirect, special,
158-
incidental, or consequential damages of any character arising as a
159-
result of this License or out of the use or inability to use the
160-
Work (including but not limited to damages for loss of goodwill,
161-
work stoppage, computer failure or malfunction, or any and all
162-
other commercial damages or losses), even if such Contributor
163-
has been advised of the possibility of such damages.
164-
165-
9. Accepting Warranty or Additional Liability. While redistributing
166-
the Work or Derivative Works thereof, You may choose to offer,
167-
and charge a fee for, acceptance of support, warranty, indemnity,
168-
or other liability obligations and/or rights consistent with this
169-
License. However, in accepting such obligations, You may act only
170-
on Your own behalf and on Your sole responsibility, not on behalf
171-
of any other Contributor, and only if You agree to indemnify,
172-
defend, and hold each Contributor harmless for any liability
173-
incurred by, or claims asserted against, such Contributor by reason
174-
of your accepting any such warranty or additional liability.
175-
176-
END OF TERMS AND CONDITIONS
177-
178-
APPENDIX: How to apply the Apache License to your work.
179-
180-
To apply the Apache License to your work, attach the following
181-
boilerplate notice, with the fields enclosed by brackets "[]"
182-
replaced with your own identifying information. (Don't include
183-
the brackets!) The text should be enclosed in the appropriate
184-
comment syntax for the file format. We also recommend that a
185-
file or class name and description of purpose be included on the
186-
same "printed page" as the copyright notice for easier
187-
identification within third-party archives.
188-
189-
Copyright [yyyy] [name of copyright owner]
190-
191-
Licensed under the Apache License, Version 2.0 (the "License");
192-
you may not use this file except in compliance with the License.
193-
You may obtain a copy of the License at
194-
195-
http://www.apache.org/licenses/LICENSE-2.0
196-
197-
Unless required by applicable law or agreed to in writing, software
198-
distributed under the License is distributed on an "AS IS" BASIS,
199-
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200-
See the License for the specific language governing permissions and
201-
limitations under the License.
202-
1+
MIT License
2+
3+
Copyright (c) 2022 Advanced Micro Devices, Inc.
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

0 commit comments

Comments
 (0)
Please sign in to comment.