-
Notifications
You must be signed in to change notification settings - Fork 329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explore Numba Function Inlining #910
Comments
Hey @seanlaw hope you had a great day! This one looks like a relatively straightforward next contribution for me - is this still an issue, if so, cool if I take a crack? |
@joehiggi1758 Sounds good. The most important thing here is to perform timing comparisons to ensure that adding this would actually make things faster. I recommend searching around previous issues where @NimaSarajpoor performed timing comparisons. It's possible that it doesn't make things faster |
Hey @seanlaw ! Finally got a chance to look into this - and I've concluded that we would see marginal improvements in the performance of Generally I approached my timing comparisons for both functions the same (and tried to stay in line with the approach @NimaSarajpoor took here)...
My results for _compute_diagonal were as follows...
My results for _shift_insert_at_index were as follows...
Please let me know if any questions, comments or feedback as I'd be more than happy to explain further! This one was a great learning opportunity for me and I had to do a decent amount of research, but am more than open to areas of opportunity as this was my first time doing a true timing comparison! |
@joehiggi1758 This is great. I agree that it doesn't seem to make anything faster. Would you mind trying one more for Finally, would you mind posting a question to the |
@seanlaw you got it! I'll post on the Regarding the varying levels of
|
Perfect! Thank you Recently, I tried inlining some other code and also found zero improvement so this bewilders me. |
@joehiggi1758 It also looks like we might add You should be able to test using Google Colab. @NimaSarajpoor Have you tried this already? It's possible that the CUDA compiler already aggressively inlines device functions automatically so there might not be any gains here. |
I came here to share a comment noticed my name is mentioned already! Sorry for the late response! In fact, I did test it to see if it improves the recursive function I used in 6-step FFT algorithm (NOT GPU functions) Since my function was recursive, had to limit the depth. See the section |
In functions like
_stump
, the sub-functions_compute_diagonal
andcore._shift_insert_at_index
may be called a lot. Therefore, we may gain some (small) performance improvement by adding@njit(inline='always')
to those (and other) functions and possibly reduce the overhead cost of calling a function repeatedly. See more here.The text was updated successfully, but these errors were encountered: