Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected Output Shape After Onnx Conversion #171

Open
amarwingxpand opened this issue Feb 13, 2025 · 3 comments
Open

Unexpected Output Shape After Onnx Conversion #171

amarwingxpand opened this issue Feb 13, 2025 · 3 comments
Labels
bug Something isn't working

Comments

@amarwingxpand
Copy link

I'm converting the YOLOv9 model to ONNX for use with NVIDIA DeepStream. Inside FastModelLoader, the _create_onnx_model function appears to handle the PyTorch-to-ONNX conversion. However, when I run this function, it outputs a list of 17 tensors with shapes like:

Output[0] shape: (1, 80, 80, 80)
Output[1] shape: (1, 16, 4, 80, 80)
Output[2] shape: (1, 4, 80, 80)
Output[3] shape: (1, 80, 40, 40)
Output[4] shape: (1, 16, 4, 40, 40)
Output[5] shape: (1, 4, 40, 40)
Output[6] shape: (1, 80, 20, 20)
Output[7] shape: (1, 16, 4, 20, 20)
Output[8] shape: (1, 4, 20, 20)
Output[9] shape: (1, 80, 80, 80)
Output[10] shape: (1, 16, 4, 80, 80)
Output[11] shape: (1, 4, 80, 80)
Output[12] shape: (1, 80, 40, 40)
Output[13] shape: (1, 16, 4, 40, 40)
Output[14] shape: (1, 4, 40, 40)
Output[15] shape: (1, 80, 20, 20)
Output[16] shape: (1, 16, 4, 20, 20)
Output[17] shape: (1, 4, 20, 20)

This is unexpected, as DeepStream typically expects a single output tensor or structured outputs containing bounding boxes (batch_size, num_boxes, 4), class confidence scores (batch_size, num_boxes, num_classes), and objectness scores (batch_size, num_boxes, 1).

How should I interpret these tensors and correctly format them for inference in DeepStream?

@amarwingxpand amarwingxpand added the bug Something isn't working label Feb 13, 2025
@henrytsui000
Copy link
Member

Hi,

You may check out the usage of PostProcess.

Currently, the model outputs predictions at three different levels (20, 40, 80). For each resolution, it produces three types of outputs:
• 80 → class predictions
• 16×4 → grid information
• 4 → bounding box coordinates

Additionally, these outputs come from two branches: auxiliary and main. This results in a total of:
3 levels × 3 output types × 2 branches = 18 outputs.

Typically, we use PostProcess to select the main branch’s outputs and apply NMS to the predictions. The shape (batch_size, num_boxes, 4) is not robust because the number of boxes varies for each image. Customizing PostProcess with padding may help address this issue.

Best regards,
Henry Tsui

@ramonhollands
Copy link
Contributor

Hi @henrytsui000,

Would be nice if we could add some additional model layers on export to do the post processing inside the model. What do you think about this?

Best regards,
Ramon

@ramonhollands
Copy link
Contributor

Did some experiment and it seems to work fine for coreml support. It saves tons of Swift code ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants