Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate transformers library. #915

Open
bw4sz opened this issue Feb 8, 2025 · 2 comments
Open

Integrate transformers library. #915

bw4sz opened this issue Feb 8, 2025 · 2 comments
Labels
API This tag is used for small improvements to the readability and usability of the python API.

Comments

@bw4sz
Copy link
Collaborator

bw4sz commented Feb 8, 2025

** Original message ** repurposed for general task.
What would it take to get transformers as a dependency to DeepForest

https://huggingface.co/docs/transformers/main/en/model_doc/owlv2

seems interesting for open set plus learning.

@bw4sz bw4sz changed the title Transformers library owlv2 Integrate transformers library. Feb 20, 2025
@bw4sz
Copy link
Collaborator Author

bw4sz commented Feb 20, 2025

DeepForest current relies on torchvision models, we would like to expand this to include huggingface's model set in transformers,

Roadmap

  1. Add to requirements, conda and setup dependencies.
  2. Add a simple transformers model to https://github.com/weecology/DeepForest/tree/main/src/deepforest/models
  3. Add a attribute to deepforest.main to know whether we are torchvision versus transformers data format, looks like documented here https://huggingface.co/docs/transformers/tasks/object_detection
  4. Update dataset.TreeDataset (
    class TreeDataset(Dataset):
    ) to 1) yield the correct data format for torchvision versus transformers, 2) Flexible across geometries. This was a pretty old function and could be refactored. See
    def determine_geometry_type(df):
    for utility in reading geometry.
  5. Check transforms for hardcoded values for normlization, in torchvision this happens within the model, not in preprocessing.
  6. predict_step needs be flexible to geometry and torchvision versus transformers output
    boxes = visualize.format_boxes(result)
  7. Write tests to check predict_image, predict_tile that covers both torchvision and transformers model.
  8. Document transformers specific logic, which models can be used? How constant is their input architecture?

@bw4sz bw4sz added the API This tag is used for small improvements to the readability and usability of the python API. label Feb 20, 2025
@bw4sz bw4sz added this to the DeepForest 2.0 milestone Feb 20, 2025
@naxatra2
Copy link
Contributor

naxatra2 commented Mar 7, 2025

Hi @bw4sz ,
I read about this issue because it seems interesting. After reading the given resources from hugging face. I have this doubt:

Are we going to add wildlife detection models using transformers or tree crown detection. Because from what I read, Owl-ViT v2 is a general-purpose open-vocabulary detector, which is not specific for training that DeepForest offers, which is specialized for detecting tree crowns. Won't this make it less effective when it comes to recognizing the unique characteristics of tree crowns in forestry imagery? because transformers like this are not well-optimized for detecting overlapping tree crowns. This inefficiency is particularly problematic in the context of forestry, where canopy overlap detection is crucial.

In deepforest/models there is already Faster R-CNN model, so ontegrating Owl-ViT v2 into a pipeline that already uses this, could introduce unnecessary computational overhead without yielding significant improvements in accuracy.

But I still tried to create a simple file for the model, are we supposed to implement something like this. This file should go in deepforest/models directory as owl.py.

from transformers import Owlv2Processor, Owlv2ForObjectDetection
from PIL import Image
import torch
from deepforest.model import Model

class OwlV2Model(Model):

    def __init__(self, config, **kwargs):
        super().__init__(config)

    def load_model(self):
        """Load Owl-ViT v2 for object detection."""
        processor = Owlv2Processor.from_pretrained("google/owlv2-base-patch16-ensemble")
        model = Owlv2ForObjectDetection.from_pretrained("google/owlv2-base-patch16-ensemble")
        return processor, model

    def detect_objects(self, image_path, texts):
        processor, model = self.load_model()

        image = Image.open(image_path).convert("RGB")

        inputs = processor(text=texts, images=image, return_tensors="pt")

        with torch.no_grad():
            outputs = model(**inputs)

        # Extract the predicted bounding boxes and scores
        logits = outputs.logits  # Prediction scores
        pred_boxes = outputs.pred_boxes  # Bounding boxes

        # Process and print results
        results = []
        for i, text in enumerate(texts):
            result_text = f"Query: {text}"
            for box, score in zip(pred_boxes[0], logits[0]):
                if score > 0.5:  # Confidence threshold
                    result_text += f"\n  - Detected {text} at {box.tolist()} with confidence {score.item():.2f}"
            results.append(result_text)
        
        return results

I am new to this topic and was reading from the hugging face docs, so if there are any improvement or any other useful resource please suggest something, then I will start working on this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API This tag is used for small improvements to the readability and usability of the python API.
Projects
None yet
Development

No branches or pull requests

2 participants