-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve cuda decoding performance by ~2x using decoder caching #258
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ahmadsharif1
I think a bit of doc (as comment in code) could be useful. Some questions that I think could be addressed in the docs are:
- What is cached
- When is stored in the cache
- What is the "hashing function" of the cache (since it's different from the "What is cached" question)
const torch::Device& device, | ||
AVCodecContext* codecContext) { | ||
throwErrorIfNonCudaDevice(device); | ||
AVBufferRef* hw_device_ctx = codecContext->hw_device_ctx; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see hw_device_ctx
being used, if this line is still necessary can you add a comment to explain why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. It was dead code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @ahmadsharif1
@ahmadsharif1 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@ahmadsharif1 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@ahmadsharif1 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@ahmadsharif1 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@ahmadsharif1 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@ahmadsharif1 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Creating the GPU decoder is a very expensive process. To reduce the time we spend creating decoders this PR now tries to reuse decoders. The way we do that is:
This doesn't touch the CPU code.
Results show 2x improvement in the benchmark:
Before:
After:
It also improves single-threaded GPU decoding:
Before:
After: