Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add enable_options and disable_options to fd.execute #3270

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

Priya2698
Copy link
Collaborator

@Priya2698 Priya2698 commented Oct 24, 2024

This PR adds enable_options and disable_options to fd.execute to allow setting them through the python frontend in lieu of the environment variables.
This work will be used to allow enabling nvfuser matmul codegen from within Thunder.

Inspired by @jacobhinkle's PR #1905!

Tracking Issue: #3022

@Priya2698
Copy link
Collaborator Author

!build

@Priya2698 Priya2698 marked this pull request as ready for review October 24, 2024 23:33
@jacobhinkle
Copy link
Collaborator

jacobhinkle commented Oct 25, 2024

Thanks for taking the initiative with this! We will also need to update FusionExecutorCache to be option-aware so that we don't re-use kernels compiled with different options. That was part of the idea with #2077: separating options (features) into ones that affect each level of caching or not.

@Priya2698
Copy link
Collaborator Author

Thanks for taking the initiative with this! We will also need to update FusionExecutorCache to be option-aware so that we don't re-use kernels compiled with different options. That was part of the idea with #2077: separating options (features) into ones that affect each level of caching or not.

Got it -- we can add that support separately, what do you think? That will allow development on #3022. IIUC, we are currently missing this for the environment variables based approach as well so we may be reusing kernels which used different env variables?

@jacobhinkle
Copy link
Collaborator

IIUC, we are currently missing this for the environment variables based approach as well so we may be reusing kernels which used different env variables?

I guess this is true: a process can change its env vars at any time and we will not recognize that change after the first time we look up an option. However that is off-label use. If we are providing a kwarg to execute then a user would probably assume it's fully supported.

I wonder if we could do something quickly like use a separate FusionCache for each set of options we receive in the frontend. If no options are provided we'd use the default. That could be done entirely in python.

@Priya2698
Copy link
Collaborator Author

Priya2698 commented Oct 28, 2024

I wonder if we could do something quickly like use a separate FusionCache for each set of options we receive in the frontend. If no options are provided we'd use the default. That could be done entirely in python.

What do you mean by using separate FusionCache?

A fusion cache is initialized here:

FusionDefinition::FusionDefinition(std::optional<size_t> id, size_t max_length)
: FusionState(),
max_length_(max_length),
fusion_id_(id),
fusion_cache_(FusionCache::get()),
trie_node_(nullptr),
prev_fusion_(nullptr),
user_sched_(nullptr),
ops(this),
sched(this) {}
. The options are passed in the execute method so they are not available when the fusion definition is initialized.

I am still going through the FusionCache files.

@jacobhinkle
Copy link
Collaborator

. The options are passed in the execute method so they are not available when the fusion definition is initialized.

Hmmm. Yeah I guess you're right. FusionCache is not the right place to do this. Instead we might do it by having a separate FusionExecutorCache might work:

if (outputs.empty()) {
outputs = scheds->auto_gen_schedules->runFusionWithInputs(
inputs, std::nullopt, selected_device);
}

Here scheds->auto_gen_schedules is a unique_ptr<FusionExecutorCache>. This is fine as a default but when features are supplied, we should create a new cache and use it instead I think.

Comment on lines +190 to +203
if len(_enable_options) or len(_disable_options):
warnings.warn(
"Reset the FusionCache manually to avoid reusing kernels when re-executing the fusion definition with different options."
)

results = self._execute(
inputs,
device=device,
override_user_schedule=override_user_schedule,
capture_debug_output=capture_debug_output,
profile=profile,
_enable_options=_enable_options,
_disable_options=_disable_options,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, moving forward with renaming the kwargs to _enable/_disable_options. I added a warning about the possibility of kernel reuse so that we are aware of this when utilizing this feature.

@Priya2698
Copy link
Collaborator Author

!build

nvfuser/__init__.py Outdated Show resolved Hide resolved
@Priya2698
Copy link
Collaborator Author

!build

Copy link
Collaborator

@jacobhinkle jacobhinkle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for making it clear that this is a somewhat unsafe option at least until we figure out the caching problem.

@Priya2698
Copy link
Collaborator Author

!test

@Priya2698
Copy link
Collaborator Author

!test --serde

@Priya2698
Copy link
Collaborator Author

!test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants