Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vec2Box mismatching tensor devices on v9-m and v9-s only #145

Open
jnt0rrente opened this issue Jan 1, 2025 · 2 comments
Open

Vec2Box mismatching tensor devices on v9-m and v9-s only #145

jnt0rrente opened this issue Jan 1, 2025 · 2 comments
Labels
bug Something isn't working

Comments

@jnt0rrente
Copy link

jnt0rrente commented Jan 1, 2025

Describe the bug

When creating a converter by following the hf_demo example, there is a mismatch in behavior depending on model size. This causes:
RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor.

This happens when DEFAULT_MODEL is v9-m or v9-s, and does not happen when using either v9-c or v7. Tracing the problem to the yolo code yielded no results, so i am asking here in case this is an oversight on my part.

To Reproduce

This is the relevant code. No other part of my code is interfacing with the yolo module. I believe copying and pasting this will yield the exact same result.

DEFAULT_MODEL = "v9-m"
IMAGE_SIZE = (640, 640)

def load_model(model_name, device):
    model_cfg = OmegaConf.load(f"./modules/yolo_config/model/{model_name}.yaml")
    model_cfg.model.auxiliary = {}
    model = create_model(model_cfg, True)
    model.to(device).eval()
    return model, model_cfg

class YoloTracker:
    component_name = 'yolo_tracker'

    def __init__(self):
        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        self.model, self.model_cfg = load_model(DEFAULT_MODEL, self.device)
        self.converter = create_converter(self.model_cfg.name, self.model, self.model_cfg.anchor, IMAGE_SIZE, self.device)
        self.class_list = ['Person'] #OmegaConf.load("./modules/yolo_config/dataset/coco.yaml").class_list
        self.transform = AugmentationComposer([])

        nms_confidence = 0.5
        nms_iou = 0.5
        max_bbox = 100

        nms_config = NMSConfig(nms_confidence, nms_iou, max_bbox)
        self.post_process = PostProcess(self.converter, nms_config)

Expected behavior

I would expect the issue to arise either on all model sizes or in none of them.

System Info (please complete the following ## information):

  • OS: Arch 6.12.7-arch1-1
  • Python Version: 3.10.16
  • PyTorch Version: 2.5.1+cu124
  • CUDA/cuDNN/MPS Version: 12.7
  • YOLO Model Version: not applicable
@jnt0rrente jnt0rrente added the bug Something isn't working label Jan 1, 2025
@jnt0rrente
Copy link
Author

On further investigation, I have traced the issue to the dummy test for auto-anchor size, since the faulty model configurations do not provide the stride parameter.

@henrytsui000
Copy link
Member

Hi,

Thank you for reporting this issue and providing detailed environment information.

The bug has been fixed in commit 5f0e785. The issue was caused by the way we adapted the converter. It has now been resolved by moving the model to the correct device after building the converter.

Best regards,
Henry Tsui

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants