-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add basic command buffer support to level zero adapter v2 #2532
base: main
Are you sure you want to change the base?
Conversation
…ied-runtime into add-command-buffer-support
@@ -14,6 +14,16 @@ | |||
|
|||
#include <ur_api.h> | |||
|
|||
#include "../common.hpp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those includes should not be needed here. Also, this file (and queue_api.cpp) is auto-generate so you can't modify this manually. You need to update https://github.com/oneapi-src/unified-runtime/blob/main/scripts/templates/queue_api.hpp.mako and run make generate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, could you also add a comment to the queue_api.hpp.mako saying that queue_api.hpp is being auto-generated? We should have already added that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a small problem with removing all includes. As far as I understand, the queue_api.hpp.mako is being generated based on some file containing declarations (ur_api.hpp?), but all of these declarations only use structures that were defined inside UR (all starting with ur_*). However, I had to add function that uses ze_command_list_handle_t, because there is no respective class with ur_ prefix. And the problem is, that the ze_command_list_handle_t must be included, so I have to at least include "common.hpp" or "../common.hpp" - without that this simply won't work. Other option would be creating something like ur_command_list_handle_t, but I believe that should be a separate PR, because of scope of the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you need to use ze_command_list_handle_t
then it should be enough to just include ze_api.h
, just add it (and the enqueueCommandBuffer declaration) to the https://github.com/oneapi-src/unified-runtime/blob/main/scripts/templates/queue_api.hpp.mako and call make generate
#include "queue_api.hpp" | ||
|
||
struct command_buffer_profiling_t { | ||
ur_exp_command_buffer_sync_point_t NumEvents; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we are naming variables/params using with all lower-case in v2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, I believe that command_list_cache should also be changed, right? (Its fields are named starting with upper-case)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good point, we never got to fixing it
ur_context_handle_t Context; | ||
// Device associated with this command buffer | ||
ur_device_handle_t Device; | ||
ze_command_list_handle_t ZeCommandList; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use v2::raii::ze_command_list_handle_t from v2/common.hpp
* @param[out] CommandList The L0 command-list created by this function. | ||
* @return UR_RESULT_SUCCESS or an error code on failure | ||
*/ | ||
ur_result_t createMainCommandList(ur_context_handle_t Context, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to cache command lists like we do for non-command-buffer path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what the non-command-buffer path does, but for reference creating a pool of command-lists to use has been an idea we've had for the v1 adapter but never got around to. TODO comment that has since been removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In v2, we have https://github.com/oneapi-src/unified-runtime/blob/main/source/adapters/level_zero/v2/command_list_cache.hpp which could be used here.
|
||
ze_command_list_handle_t ZeCommandList = nullptr; | ||
UR_CALL(createMainCommandList(Context, Device, IsUpdatable, ZeCommandList)); | ||
try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please move the try/catch block to cover the entire function and add try/catch to other functions as well, i.e. urCommandBufferCreateExp(...) try { ... } catch (...) { return exceptionToResult(std::current_exception()); }
In v2, we have helpers functions that might throw so it's best to wrap every function with try/catch.
UR_CALL_THROWS(ur::level_zero::urDeviceRetain(Device)); | ||
} | ||
|
||
void ur_exp_command_buffer_handle_t_::cleanupCommandBufferResources() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think urContextRelease/urDeviceRelease
can actually fail.
I would suggest just removing this entire function and moving the logic to the destructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UR_CALL_THROWS
can throw an exception, which is bad practice to do in C++ destructors. So if we want to do this move I think removing that thow behaviour the main thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about urKernelRelease? I also need to release it when releasing buffer, so I would also need to know if it can fail, and I am not sure if it is a case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, technically, urKernelRelease can fail if zeKernelDestroy fails but if that happens and we throw from this function we can have a memory leak (if there are more unreleased kernels in the vector).
Also, is it really that useful to return an error from urCommandBufferReleaseExp?
Perhaps we could just log all the failures from urKernelRelease
, urContextRelease
, etc and always return UR_RESULT_SUCCESS in urCommandBufferReleaseExp
This would ensure that all resources are (attempted to be) freed, even if there is some error during release of one of them.
ur_exp_command_buffer_command_handle_t_(ur_exp_command_buffer_handle_t, | ||
uint64_t); | ||
|
||
virtual ~ur_exp_command_buffer_command_handle_t_(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn;t need to be virtual I think
@@ -19,6 +19,8 @@ from templates import helper as th | |||
* | |||
*/ | |||
|
|||
// This file was generated basing on scripts/templates/queue_api.cpp.mako |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
Do not edit. This file is auto generated from a template:
scripts/templates/queue_api.cpp.mako
} // namespace | ||
|
||
std::pair<ze_event_handle_t *, uint32_t> | ||
ur_exp_command_buffer_handle_t_::getWaitListView( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is identical to the implementation in queue. Please create simple WaitListView abstraction usable in both.
ze_command_list_handle_t &commandList) { | ||
|
||
using queue_group_type = ur_device_handle_t_::queue_group_info_t::type; | ||
// that should be call to queue getZeOrdinal, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This, together with the fact that we have no way to allocate events from the correct queue, makes me think we either need to defer creation of these objects until the first enqueue of the command buffer or urCommandBufferCreateExp
should take a queue.
@EwanC thoughts?
I'm also thinking whether it wouldn't make sense just to make the CommandBuffer allocate a whole pool of events from the context. That way, when we need an event, we don't have to acquire context locks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This, together with the fact that we have no way to allocate events from the correct queue, makes me think we either need to defer creation of these objects until the first enqueue of the command buffer or urCommandBufferCreateExp should take a queue.
@EwanC thoughts?
I'm not that keen on urCommandBufferCreateExp
taking a queue. 1) it doesn't match the SYCL API, where we don't have a queue object when creating the UR command-buffer. 2) It is the opposite direction to where the OpenCL WG is going with command-buffers in KhronosGroup/OpenCL-Docs#1292 to separate the queues used on command-buffer creation and enqueue.
Are there only 2 types of queue ordinal relevant here, compute and copy? If so I would suggest creating command-lists for both. This PR doesn't have the v1 functionality of splitting commands from the UR command-buffers into compute and copy command-lists. But we saw this having good perf benefits on V1, so would be imagine creating a copy engine command-list is something we'll need to do at some point anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there only 2 types of queue ordinal relevant here, compute and copy?
In v2 by default we are only going to use the compute ordinal, letting the UMD decide whether to offload the copy to a separate engine. So the answer here is: "just use compute". I was more thinking about the general direction.
What you say makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, letting the UMD handle this consideration definitely sounds like it will simplify the adapter code
checkImmediateAppendSupport(context); | ||
|
||
if (isUpdatable) { | ||
UR_ASSERT(context->getPlatform()->ZeMutableCmdListExt.Supported, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't use these "asserts". Do a normal if and return UR_RESULT_ERROR_UNSUPPORTED_FEATURE
.
I just dislike overloading the term "assert" to mean fail with error.
* @param[out] commandList The L0 command-list created by this function. | ||
* @return UR_RESULT_SUCCESS or an error code on failure | ||
*/ | ||
ur_result_t createMainCommandList(ur_context_handle_t context, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this isn't used anywhere.
std::ignore = kernelAlternatives; | ||
std::ignore = command; | ||
try { | ||
UR_ASSERT(hKernel, UR_RESULT_ERROR_INVALID_NULL_HANDLE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is near identical to the queue enqueue implementation. I imagine a lot of other similar enqueue methods will be likewise similar.
We need to come up with an abstraction that lets us share this implementation between queue and command buffers.
While reviewing this, I had a thought that a command buffer is a subset of the queue functionality with some extra bits. This sounds like this should be solvable with composition:
class cmds { // enqueuable? command_list?
WaitViewList waitlist;
CmdList cmdlist;
event_pool events;
enqueue_kernel enqueue_kernel(...);
...
}
class queue {
cmds cmd;
}
class command_buffer {
cmds cmd;
}
@igchor thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, also, we can simplify the enqueue operations by combining getSignalEvent and getWaitListView functions. I would move this functionality to the ur_command_list_handler_t (we might want to rename it) and implement it like this:
struct ur_command_list_handler_t {
ur_command_list_handler_t(ur_context_handle_t hContext,
ur_device_handle_t hDevice,
const ur_queue_properties_t *pProps);
ur_command_list_handler_t(ze_command_list_handle_t hZeCommandList,
bool ownZeHandle);
std::tuple<ze_event_handle_t, uint32_t, ze_event_handle_t *>
getSignalEventAndWaitList(ur_event_handle_t *hUserEvent,
ur_command_t commandType,
const ur_event_handle_t *phWaitEvents,
uint32_t numWaitEvents);
raii::command_list_unique_handle commandList;
std::vector<ze_event_handle_t> waitList;
event_pool events;
};
then, in enqueue* we can do:
auto [signalEvent, numWaitEvents, waitEvent] = getSignalEventAndWaitList(...);
...
ZE2UR_CALL(zeCommandListAppenSomething(..., signalEvent, numWaitEvents, waitEvent));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, right now, getSIgnalEvent returns ur_event_handle_t (not ze_event_handle_t) because in enqueueTimestampRecordingExp, we need to call timestamp-related functions on it. But actually, we already have a pointer to the ur_event - it's the same one that we pass as a first argument to the getSignalEvent.
It's better to make getSignalEventAndWaitList return ze_event_handle_t so that we can avoid checking for nullptr.
std::pair<ze_event_handle_t *, uint32_t> | ||
getWaitListView(const ur_event_handle_t *phWaitEvents, | ||
uint32_t numWaitEvents); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private:
uint64_t); | ||
|
||
~ur_exp_command_buffer_command_handle_t_(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private:
} | ||
ur_result_t | ||
urCommandBufferRetainExp(ur_exp_command_buffer_handle_t hCommandBuffer) { | ||
try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please move the try before the first {
so it's like this:
ur_result_t
urCommandBufferRetainExp(ur_exp_command_buffer_handle_t hCommandBuffer) try {
...
} catch(..._ {
...
}
This way there's one less indentation level.
This pull request implements basic calls to command buffer in level zero v2 adapter. These calls are required by sycl graph functionality implemented inside llvm, such as record and replay.