Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore pop from an empty external ids stack in RoctracerLogger to avoid crash. #1006

Closed
wants to merge 1 commit into from

Conversation

sraikund16
Copy link
Contributor

Summary:
See D62090845 for the context.
This diff is trying to mimic nvidia side behavior.
Take a similar workload/application that dyno trace crashes on MI300x, dyno trace on H100 looks like P1666484898. If search for keyword CUPTI_ERROR_QUEUE_EMPTY and refer to
nvidia's doc, it looks like the suspicious migrated fiber thread attempts to deque from nvidia's thread_local queue fail, just like what we saw on the AMD side.

Differential Revision: D64974651

…id crash.

Summary:
See D62090845 for the context.
This diff is trying to mimic nvidia side behavior.
Take a similar workload/application that dyno trace crashes on MI300x, dyno trace on H100 looks like P1666484898. If search for keyword `CUPTI_ERROR_QUEUE_EMPTY` and refer to
[nvidia's doc](https://l.facebook.com/l.php?u=https%3A%2F%2Fdocs.nvidia.com%2Fcuda%2Farchive%2F9.2%2Fcupti%2Fgroup__CUPTI__ACTIVITY__API.html%23group__CUPTI__ACTIVITY__API_1g47395bf12ff55f30822d408b940567e3&h=AT1GbJqjqyEYga1oPxXkXPwznRcRGKnHtSlUt_708U3wxjzTel6MJbF2-o7f5yp7pdDKJ5Y_ASuojzFRECp-un81L7PU6GvesQfQ10v7419Eaqm3laLWGZIZldZpczkg37FlbFbI6zC59n6xtOdrscxX-bA), it looks like the suspicious migrated fiber thread attempts to deque from nvidia's thread_local queue fail, just like what we saw on the AMD side.

Differential Revision: D64974651
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64974651

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 5f5dc26.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants