Integrate transformers library. #915

bw4sz · 2025-02-08T16:19:17Z

** Original message ** repurposed for general task.
What would it take to get transformers as a dependency to DeepForest

https://huggingface.co/docs/transformers/main/en/model_doc/owlv2

seems interesting for open set plus learning.

bw4sz · 2025-02-20T18:38:08Z

DeepForest current relies on torchvision models, we would like to expand this to include huggingface's model set in transformers,

Roadmap

Add to requirements, conda and setup dependencies.
Add a simple transformers model to https://github.com/weecology/DeepForest/tree/main/src/deepforest/models
Add a attribute to deepforest.main to know whether we are torchvision versus transformers data format, looks like documented here https://huggingface.co/docs/transformers/tasks/object_detection
Update dataset.TreeDataset (

DeepForest/src/deepforest/dataset.py

Line 48 in 95c04d5

class TreeDataset(Dataset):

) to 1) yield the correct data format for torchvision versus transformers, 2) Flexible across geometries. This was a pretty old function and could be refactored. See

DeepForest/src/deepforest/utilities.py

Line 251 in 95c04d5

def determine_geometry_type(df):

for utility in reading geometry.
Check transforms for hardcoded values for normlization, in torchvision this happens within the model, not in preprocessing.
predict_step needs be flexible to geometry and torchvision versus transformers output

DeepForest/src/deepforest/main.py

Line 852 in 95c04d5

boxes = visualize.format_boxes(result)
Write tests to check predict_image, predict_tile that covers both torchvision and transformers model.
Document transformers specific logic, which models can be used? How constant is their input architecture?

naxatra2 · 2025-03-07T10:54:37Z

Hi @bw4sz ,
I read about this issue because it seems interesting. After reading the given resources from hugging face. I have this doubt:

Are we going to add wildlife detection models using transformers or tree crown detection. Because from what I read, Owl-ViT v2 is a general-purpose open-vocabulary detector, which is not specific for training that DeepForest offers, which is specialized for detecting tree crowns. Won't this make it less effective when it comes to recognizing the unique characteristics of tree crowns in forestry imagery? because transformers like this are not well-optimized for detecting overlapping tree crowns. This inefficiency is particularly problematic in the context of forestry, where canopy overlap detection is crucial.

In deepforest/models there is already Faster R-CNN model, so ontegrating Owl-ViT v2 into a pipeline that already uses this, could introduce unnecessary computational overhead without yielding significant improvements in accuracy.

But I still tried to create a simple file for the model, are we supposed to implement something like this. This file should go in deepforest/models directory as owl.py.

from transformers import Owlv2Processor, Owlv2ForObjectDetection
from PIL import Image
import torch
from deepforest.model import Model

class OwlV2Model(Model):

    def __init__(self, config, **kwargs):
        super().__init__(config)

    def load_model(self):
        """Load Owl-ViT v2 for object detection."""
        processor = Owlv2Processor.from_pretrained("google/owlv2-base-patch16-ensemble")
        model = Owlv2ForObjectDetection.from_pretrained("google/owlv2-base-patch16-ensemble")
        return processor, model

    def detect_objects(self, image_path, texts):
        processor, model = self.load_model()

        image = Image.open(image_path).convert("RGB")

        inputs = processor(text=texts, images=image, return_tensors="pt")

        with torch.no_grad():
            outputs = model(**inputs)

        # Extract the predicted bounding boxes and scores
        logits = outputs.logits  # Prediction scores
        pred_boxes = outputs.pred_boxes  # Bounding boxes

        # Process and print results
        results = []
        for i, text in enumerate(texts):
            result_text = f"Query: {text}"
            for box, score in zip(pred_boxes[0], logits[0]):
                if score > 0.5:  # Confidence threshold
                    result_text += f"\n  - Detected {text} at {box.tolist()} with confidence {score.item():.2f}"
            results.append(result_text)
        
        return results

I am new to this topic and was reading from the hugging face docs, so if there are any improvement or any other useful resource please suggest something, then I will start working on this

bw4sz changed the title ~~Transformers library owlv2~~ Integrate transformers library. Feb 20, 2025

bw4sz added the API This tag is used for small improvements to the readability and usability of the python API. label Feb 20, 2025

bw4sz added this to the DeepForest 2.0 milestone Feb 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate transformers library. #915

Integrate transformers library. #915

bw4sz commented Feb 8, 2025 •

edited

Loading

bw4sz commented Feb 20, 2025 •

edited

Loading

naxatra2 commented Mar 7, 2025

Integrate transformers library. #915

Integrate transformers library. #915

Comments

bw4sz commented Feb 8, 2025 • edited Loading

bw4sz commented Feb 20, 2025 • edited Loading

Roadmap

naxatra2 commented Mar 7, 2025

bw4sz commented Feb 8, 2025 •

edited

Loading

bw4sz commented Feb 20, 2025 •

edited

Loading