fix: raise UploadErrors for images and annotations #266

caiquejjx · 2024-06-23T20:14:09Z

Description

Fixes issue #235 by raising UploadError exceptions that happens inside single_upload

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)

How has this change been tested, please provide a testcase or example of how you tested the change?

Same test as the issue

import json
import roboflow

rf = roboflow.Roboflow(api_key="API_KEY")
img_path = "<path_to_img>"

workspace_id = "<ws ID>"
project_id = "<proj ID>"
workspace = rf.workspace(workspace_id)
project = workspace.project(project_id)

annotation = {
    "it is": [
        {
            "very frustrating": 0,
            "when you": "send",
            "an annotation": {
                "file and": 0.5,
                
                "function": 0.5,
                "call": 0.5,
                "completes": 0.5,
                "successfully,": 0.5,

            },
            "but": "there's",
            "no": "annotations",
            "on": "the website",
            "nor": "any",
            "errors": "raised",
        }
    ]
}
with open("annotation.json", "w") as f:
    json.dump(annotation, f)

image = project.upload(img_path, annotation_path="annotation.json")

Any specific deployment considerations

For example, documentation changes, usability, usage/costs, secrets, etc.

Docs

Docs updated? What were the changes:

LinasKo · 2024-06-26T07:34:50Z

Hey @caiquejjx 👋

Typically, we ask for a Colab to showcase the changes, which significantly speeds up the review process.

I made one for us this time:
https://colab.research.google.com/drive/1xNixTC2xljopKhSpatDhYr2pCY0b5HvG#scrollTo=xI1cabXdatjr

I like the change, but wonder what the broader impact would be. Whether there's anyone relying on us failing silently.

caiquejjx · 2024-06-26T13:56:01Z

Thanks @LinasKo! yeah it makes sense

CLAassistant · 2024-07-12T14:09:33Z

All committers have signed the CLA.

LinasKo · 2024-08-13T09:29:12Z

Hi @caiquejjx, I'll be looking at this, and if you're around - let's get it merged!

I've tested the solution and it works well, but I'd like a slightly safer approach to parsing the error.

Here's what I have in mind:

    def _parse_upload_error(self, error: rfapi.UploadError) -> str:
        dict_part = str(error).split(": ", 2)[2]
        # Could be done with ast.literal_eval, but this is safer.
        if re.search(r"'\w+':", dict_part):
            temp_str = dict_part.replace(r"\'", "<PLACEHOLDER>")
            temp_str = temp_str.replace("'", '"')
            dict_part = temp_str.replace("<PLACEHOLDER>", "'")
        parsed_dict: dict = json.loads(dict_part)
        message = parsed_dict.get("message")
        return message or str(parsed_dict)

Could you replace the method (and import re instead of ast), and then test it?

Let's also add a correct annotation test, with annotation format such as this:

Annotations Dict

annotation = {
    "info": {
        "year": "2020",
        "version": "1",
        "description": "None",
        "contributor": "Linas",
        "url": "https://app.roboflow.ai/datasets/hard-hat-sample/1",
        "date_created": "2000-01-01T00:00:00+00:00",
    },
    "licenses": [{"id": 1, "url": "https://creativecommons.org/publicdomain/zero/1.0/", "name": "Public Domain"}],
    "categories": [
        {"id": 0, "name": "cat", "supercategory": "animals"},
    ],
    "images": [
        {
            "id": 0,
            "license": 1,
            "file_name": img_path,
            "height": 1024,
            "width": 1792,
            "date_captured": "2020-07-20T19:39:26+00:00",
        }
    ],
    "annotations": [
        {
            "id": 0,
            "image_id": 0,
            "category_id": 0,
            "bbox": [45, 2, 85, 85],
            "area": 7225,
            "segmentation": [],
            "iscrowd": 0,
        },
        {
            "id": 1,
            "image_id": 0,
            "category_id": 0,
            "bbox": [324, 29, 72, 81],
            "area": 5832,
            "segmentation": [],
            "iscrowd": 0,
        },
    ],
}

caiquejjx · 2024-08-13T13:49:16Z

Hey @LinasKo sure thing, I'll work on these soon and I guess we should also have automated tests right? (not sure if it was what you meant)

LinasKo · 2024-08-13T13:53:19Z

Aye, I wasn't too clear. A Colab will be fine this time - no need for unit tests.

caiquejjx · 2024-08-14T00:22:03Z

@LinasKo done, I had to recreate the colab in order to edit

LinasKo

Good work spotting the missing replacings!

It's so tempting to revert back to ast.literal_eval at this point - seems so much simpler. Following the threads though, I see ways of crashing ast.literal_eval with just 301 symbols. If I knew whether the input is considered trusted, it would be an easy choice, but now I'd rather not take the chances.

I left one comment in the code. I think we're almost good to go.

roboflow/core/project.py

LinasKo · 2024-08-14T21:19:31Z

roboflow/core/project.py

+        if re.search(r"'\w+':", dict_part):
+            dict_part = dict_part.replace("True", "true")
+            dict_part = dict_part.replace("False", "false")
+            dict_part = dict_part.replace("None", "null")


What if re.search(r"'\w+':", dict_part) is false?

We still wish to load from json, but since we didn't run dict_part = dict_part.replace("True", "true"), it might not be valid.

I guess now I got it right

LinasKo

Thank you very much - looks good! 👍
Extra tests here show exactly what I'd expect: Colab

Upload successful for new image + annotation
Error saying annotations already exist, if you try again
Incorrect annotation format error if wrong annotations are passed in.

LinasKo · 2024-08-15T08:03:07Z

@tonylampada, could you please have a look at this? I believe it is ready to merge.

tonylampada · 2024-08-15T15:21:31Z

@LinasKo I think uploading with the CLI relies on the previous more lenient handling of annotation errors. Can you test that and see if it still has the same behaviour when importing datasets with the CLI?

Also, I think the backend already provides valid json in case of error, it just seems weird parsing it using regex like that.

LinasKo · 2024-08-15T16:04:43Z

@tonylampada, here's the repr of the error. It's not valid json - the strings are declared with '!.

UploadError("save annotation for MzqOjxfl22A5rxmFhrUI / bad response: 400: {'message': 'Image was already annotated.', 'type': 'InvalidImageException', 'hint': 'This image was already annotated; to overwrite the annotation, pass overwrite=true as a query parameter.'}")

I very much agree it's weird! The correct solution would be to change wherever the json is produced. Alternatively, let's use the whole string. Also, we can avoid all of this with ast.literal_eval, as per original suggestion by @caiquejjx, but that's a slight risk if the API error ever contains user-defined data.

CLI test results coming soon.

LinasKo · 2024-08-15T16:50:24Z

@tonylampada, I've tested:

Uploading image + annotation
Downloading object detection dataset (hard hats sample)
Importing hard hats into another obj detection project
No issues during upload, no issues on the site.

However, I also labelled a keypoints dataset, created a version and attempted to download it, which failed.
I think it's related to json, but unrelated to my changes.

Could you test if the following works?

roboflow download https://app.roboflow.com/linas-ws/keypooint-test/1 -l dataset-kp

Error response

  ❯ roboflow download https://app.roboflow.com/linas-ws/keypooint-test/1 -l dataset-kp
loading Roboflow workspace...
loading Roboflow project...
Traceback (most recent call last):
  File "/Users/linasko/rf/pr/roboflow-python/.venv/lib/python3.8/site-packages/requests/models.py", line 974, in json
    return complexjson.loads(self.text, **kwargs)
  File "/Users/linasko/.pyenv/versions/3.8.19/lib/python3.8/json/__init__.py", line 357, in loads
    return _default_decoder.decode(s)
  File "/Users/linasko/.pyenv/versions/3.8.19/lib/python3.8/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/Users/linasko/.pyenv/versions/3.8.19/lib/python3.8/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/linasko/rf/pr/roboflow-python/.venv/lib/python3.8/site-packages/roboflow/core/version.py", line 258, in export
    raise RuntimeError(response.json())
  File "/Users/linasko/rf/pr/roboflow-python/.venv/lib/python3.8/site-packages/requests/models.py", line 978, in json
    raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/linasko/rf/pr/roboflow-python/.venv/bin/roboflow", line 8, in <module>
    sys.exit(main())
  File "/Users/linasko/rf/pr/roboflow-python/.venv/lib/python3.8/site-packages/roboflow/roboflowpy.py", line 423, in main
    args.func(args)
  File "/Users/linasko/rf/pr/roboflow-python/.venv/lib/python3.8/site-packages/roboflow/roboflowpy.py", line 45, in download
    version.download(args.format, location=args.location, overwrite=True)
  File "/Users/linasko/rf/pr/roboflow-python/.venv/lib/python3.8/site-packages/roboflow/core/version.py", line 205, in download
    self.export(model_format)
  File "/Users/linasko/rf/pr/roboflow-python/.venv/lib/python3.8/site-packages/roboflow/core/version.py", line 260, in export
    response.raise_for_status()
  File "/Users/linasko/rf/pr/roboflow-python/.venv/lib/python3.8/site-packages/requests/models.py", line 1024, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: https://api.roboflow.com/linas-ws/keypooint-test/1/voc?api_key=<MY_API_KEY>

LinasKo · 2024-08-16T07:45:01Z

@tonylampada, responding to your Loom, I see the bigger picture now. We're both working towards the same goal of not hiding useful error messages.

The problem is that dataset import captures the result of project.single_upload and logs it with _log_img_upload, whereas uploading a single image/annotation completely ignores the result.

The correct solution would keep one of these approaches. Should we implement an equivalent of _log_img_upload in roboflow/roboflowpy.py::upload_image(), and then remove the raise introduced by this PR?

tonylampada · 2024-08-16T10:49:31Z

Aforementioned loom here just for reference (so we don't exclude @caiquejjx from the conversation :-))
https://www.loom.com/share/5f7a6520e0364b7588fd6130c682e0ef

tonylampada · 2024-08-16T11:09:09Z

@LinasKo here's what I'm thinking.

project.single_upload has an "identity crisis" because it's not exactly "single" upload, is it? :-)
It can upload:

one image
one annotation (I doubt this is much used, but it's there)
one image, and then one annotation

Then there's the question of what do we do when something goes wrong with one of those operations.
Ideally we should decouple those in different methods, but that would be a breaking change, probably not worth it. So, scratch that.

Then, when should single_upload throw an error?
One possible answer is:

one image = RAISE if fails
one annotation = RAISE if fails
one image, and then one annotation
- RAISE if image fails
- RETURN image id + annotation error if annotation fails

I think this kinda makes sense from the perspective of what error means.
But I don't like it. Because it's a little more complex to implement inside (single_upload) and that complexity also leaks by forcing the caller to know when to handle and/or look inside the results.

So I'm tending to think that a better answer is:

Always return, never raise. The return value is a dictionary that contains the result for each operation (like: "image upload=ok, here's the id, annotation upload failed, here's the error")

Now, duplicating _log_img_upload is something I don't like. DRY, right?
Maybe we should move that responsibility into single_upload as well, before returning the result dic. wdyt?

caiquejjx · 2024-08-16T13:06:54Z

@tonylampada thanks for the loom! one note is that single_upload kinda did that before already but the problem was that the upload method didn't return anything and one of the reasons I went for raising is because people probably don't check the return of the upload since it doesn't returns anything, so besides the change on single_upload we should also start returning the result in upload, right? or maybe logging

tonylampada · 2024-08-16T13:37:14Z

we should also start returning the result in upload, right?

Yup. That makes sense

or maybe logging

I think it might be better to move the logging responsibility into single_upload.

lrosemberg · 2024-08-18T21:00:04Z

I'm writing some tests for my PR (that maintains the compatibility with CLI), and this PR breaks the upload parse in some cases, I'll describe it better below.

In the line where the split is done by a colon and you access position 2, not all returns from the rfapi.upload_image have the status code, and so they don't have this second colon.

@caiquejjx @LinasKo @tonylampada

This PR should fix that, and another things:

#312

fix: raise UploadErrors for images and annotations

678912a

caiquejjx mentioned this pull request Jun 23, 2024

Roboflow API fails silently when invalid annotations file is uploaded. #235

Open

fix: make sure a string is returned

eb7a69d

caiquejjx added 2 commits August 13, 2024 20:30

fix: change ast eval to regex operation

2aee263

fix: add missing replaces

85fa852

LinasKo requested changes Aug 14, 2024

View reviewed changes

roboflow/core/project.py Outdated Show resolved Hide resolved

fix: move keywords replace up

b9e0c87

LinasKo requested changes Aug 14, 2024

View reviewed changes

fix: move keywords replace above if

5033c10

LinasKo approved these changes Aug 15, 2024

View reviewed changes

LinasKo merged commit 1083a06 into roboflow:main Aug 15, 2024
6 checks passed

lrosemberg mentioned this pull request Aug 18, 2024

Maintains compatibility with CLI return, broken in PR #266 #309

Closed

1 task

lrosemberg mentioned this pull request Aug 20, 2024

Refactoring dataset upload #312

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: raise UploadErrors for images and annotations #266

fix: raise UploadErrors for images and annotations #266

caiquejjx commented Jun 23, 2024

LinasKo commented Jun 26, 2024

caiquejjx commented Jun 26, 2024

CLAassistant commented Jul 12, 2024 •

edited

Loading

LinasKo commented Aug 13, 2024

caiquejjx commented Aug 13, 2024

LinasKo commented Aug 13, 2024

caiquejjx commented Aug 14, 2024

LinasKo left a comment •

edited

Loading

LinasKo Aug 14, 2024

caiquejjx Aug 14, 2024

LinasKo left a comment •

edited

Loading

LinasKo commented Aug 15, 2024

tonylampada commented Aug 15, 2024

LinasKo commented Aug 15, 2024 •

edited

Loading

LinasKo commented Aug 15, 2024

LinasKo commented Aug 16, 2024 •

edited

Loading

tonylampada commented Aug 16, 2024

tonylampada commented Aug 16, 2024 •

edited

Loading

caiquejjx commented Aug 16, 2024

tonylampada commented Aug 16, 2024 •

edited

Loading

lrosemberg commented Aug 18, 2024 •

edited

Loading

fix: raise UploadErrors for images and annotations #266

fix: raise UploadErrors for images and annotations #266

Conversation

caiquejjx commented Jun 23, 2024

Description

Type of change

How has this change been tested, please provide a testcase or example of how you tested the change?

Any specific deployment considerations

Docs

LinasKo commented Jun 26, 2024

caiquejjx commented Jun 26, 2024

CLAassistant commented Jul 12, 2024 • edited Loading

LinasKo commented Aug 13, 2024

caiquejjx commented Aug 13, 2024

LinasKo commented Aug 13, 2024

caiquejjx commented Aug 14, 2024

LinasKo left a comment • edited Loading

Choose a reason for hiding this comment

LinasKo Aug 14, 2024

Choose a reason for hiding this comment

caiquejjx Aug 14, 2024

Choose a reason for hiding this comment

LinasKo left a comment • edited Loading

Choose a reason for hiding this comment

LinasKo commented Aug 15, 2024

tonylampada commented Aug 15, 2024

LinasKo commented Aug 15, 2024 • edited Loading

LinasKo commented Aug 15, 2024

LinasKo commented Aug 16, 2024 • edited Loading

tonylampada commented Aug 16, 2024

tonylampada commented Aug 16, 2024 • edited Loading

caiquejjx commented Aug 16, 2024

tonylampada commented Aug 16, 2024 • edited Loading

lrosemberg commented Aug 18, 2024 • edited Loading

CLAassistant commented Jul 12, 2024 •

edited

Loading

LinasKo left a comment •

edited

Loading

LinasKo left a comment •

edited

Loading

LinasKo commented Aug 15, 2024 •

edited

Loading

LinasKo commented Aug 16, 2024 •

edited

Loading

tonylampada commented Aug 16, 2024 •

edited

Loading

tonylampada commented Aug 16, 2024 •

edited

Loading

lrosemberg commented Aug 18, 2024 •

edited

Loading