Taking multiple input from different cameras and decide the result #13140

itskhaledmohammad · 2024-06-26T20:16:19Z

itskhaledmohammad
Jun 26, 2024

Hi everyone and @glenn-jocher and team, @UltralyticsAssistant,
Would be really helpful if you can help me with this.

So I am working on a project, there will be 3 cameras one from the top and two at 45 degree angle, it will be pointing to a tray.
Something like this:

##What and the why:
Want I am trying to achieve is, that I will take video feed from 3 inputs, and the YOLO will detect what are the possible items on the tray, so one item might be visible on one camera and might be not visible or partially visible on another, and also YOLO might not be confident about an object in one camera but the same object YOLO might detect with high confidence from the other camera. For example a beverage can might not be detectable from the top (as you can only see the opening part and most beverage have similar looking openings on top), but be detectable from another angle where the body can be seen). Hence the multiple cameras.

Questions:

How to take input from multiple camera and get it to predict the objects combining all 3 results? what should be the procedure I should go with?
As you see in the picture, I tried to keep the background a constant black, hoping that will help with training and prediction, Is there any particular color other than black that might help like white or stripes?
As I have a fixed background, so how many pictures of an object would I need? I have seen in the documentation it says > 1500 images of each items, but given my scenerio do I need that much, taking 1500 images of each food item would be really hard each time, if get like 36-40 pictures of each items (and each item will be class) and then augment it, would that work and if yes but what type of augmentation do you recommend?
Also when training on a new item, I would like to take a single item and take multiple images of that, so would be a single instance of that item in all the images, but they will appear together in real case scenario, lets say two instances of pack of chips and beverages, would that be a problem? And is there any particular way to train for that?
As the cameras are 1080p, the images I am training on is 1920x1080, is that a problem? Or I will have to downscale them, if so what resolution?
I am using OAK-D cameras, these are stereo cameras, can I use the depth data or any other data to give better results for the detection or help in my case?

I did go through the ultralytics documentation so these questions came up. Thanks in Advance, hoping to get guidelines from you guys.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Taking multiple input from different cameras and decide the result #13140

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Taking multiple input from different cameras and decide the result #13140

itskhaledmohammad Jun 26, 2024

Replies: 0 comments

itskhaledmohammad
Jun 26, 2024