Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nonMaxSuppressionAsync extremely slow #8320

Open
JijaProGamer opened this issue Jul 3, 2024 · 6 comments
Open

nonMaxSuppressionAsync extremely slow #8320

JijaProGamer opened this issue Jul 3, 2024 · 6 comments

Comments

@JijaProGamer
Copy link

I use nonMaxSuppressionAsync in my code for getting rid of useless bounding boxes from a detection model, but this simple line of code:

console.time("nms") const nms = await tf.image.nonMaxSuppressionAsync(boxes, scores, 30, detectionSettings.iouThreshold, detectionSettings.scoreThreshold); console.timeEnd("nms")

Takes 20ms !!! That's 4 times the duration of the actual model running:

console.time("model") const res = model.execute(imgTensor); console.timeEnd("model")

since the model is pretty small and only 256x256, I would expect it to be slow, but nonMaxSuppressionAsync should be fast even with thousands of bounding boxes, I cannot implement one myself because tensor.data() for the boxes, scores and classes seem to take out 32 ms out of themselfs, even more than NMS

@gaikwadrahul8
Copy link
Contributor

Hi, @JijaProGamer

I apologize for the delayed response and thank you for bringing this issue to our attention, could you please help us with minimal code snippet/codepen example or your Github repo with complete steps to replicate the same behavior from our end to investigate this issue further as soon as possible ?

Thank you for your cooperation and patience.

@JijaProGamer
Copy link
Author

Hi, @JijaProGamer

I apologize for the delayed response and thank you for bringing this issue to our attention, could you please help us with minimal code snippet/codepen example or your Github repo with complete steps to replicate the same behavior from our end to investigate this issue further as soon as possible ?

Thank you for your cooperation and patience.

I'll reply one ASAP as I get home in a hour or two, thanks

@JijaProGamer
Copy link
Author

I've made a small github repo (100 lines of code) so you can debug it fastly - https://github.com/JijaProGamer/NMS-Issue/blob/master/page.html

Just download the html file, press to upload a image, and for the second upload look at the console at console.time for NMS and Model. For images smaller than 512x512 the model takes 15ms or under, while NMS takes in a good case 10MS, and usually up to 20-25MS (rx 6750 xt, i5 12600kf, 32GB Ram if specs needed).

@JijaProGamer
Copy link
Author

Any progress on this issue?

@shmishra99
Copy link
Contributor

Hi @JijaProGamer ,

Sorry for the late reply! I ran your code on a few different images, and here's the breakdown on how long each part took to run:

model: 652.841064453125 ms
issue8320.html:88 nms: 13.803955078125 ms

issue8320.html:84 model: 32.6669921875 ms
issue8320.html:88 nms: 45.43994140625 ms

issue8320.html:84 model: 167.931884765625 ms
issue8320.html:88 nms: 158.8330078125 ms

The model size is smaller and has fewer parameters, which generally results in faster execution times. Additionally, tf.image.nonMaxSuppressionAsync involves complex computations for non-maximum suppression of bounding boxes, so the execution time can also be significantly influenced by the characteristics of the input images (such as the number of bounding boxes and their overlap). You can review the source code here. If you have any suggestions for improving the algorithm, please let us know.

Thank You!!

Copy link

This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.

@github-actions github-actions bot added the stale label Sep 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants