Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update delpaths function to use sort with slicing #3040

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 6 additions & 4 deletions src/jv_aux.c
Original file line number Diff line number Diff line change
Expand Up @@ -500,8 +500,10 @@ jv jv_delpaths(jv object, jv paths) {
jv_free(paths);
return jv_invalid_with_msg(jv_string("Paths must be specified as an array"));
}
paths = jv_sort(paths, jv_copy(paths));
jv_array_foreach(paths, i, elem) {
int paths_length = jv_array_length(jv_copy(paths));
paths = jv_sort(jv_array_slice(paths, 0, paths_length), jv_copy(paths));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can't be the fix, right? I mean, the problem is that we need to canonicalize array indices in paths first, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jv_array_slice(a, 0, length_of_a) should be a no-op, too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what this patch is trying to do either

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nicowilliams
is this what you're suggesting?

jv canonicalize_indices(jv paths) {
  int paths_length = jv_array_length(jv_copy(paths));
  for (int i = 0; i < paths_length; i++) {
    jv path = jv_array_get(jv_copy(paths), i);
    int index = jv_number_value(path);
    if (index < 0) {
      index = paths_length + index;
    }
    if (index < 0 || index >= paths_length) {
      jv_free(paths);
      return jv_invalid_with_msg(jv_string("Invalid array index"));
    }
    paths = jv_array_set(paths, i, jv_number(index));
  }
  return paths;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To canonicalize the paths we need to refer to the tree they refer to. The idea is to take any negative indices in the paths and normalize them to positive indices, but to do this you need to know the length of the corresponding array in the tree for that negative index in the path.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (index < 0) index = paths_length + index;? Why would you want to do that? What bug did you solve with those patches?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay so to convert a negative index into a positive index, we need the length of the subtree, this could work then

jv canonicalize_indices(jv paths, jv tree) {
  int paths_length = jv_array_length(jv_copy(paths));
  for (int i = 0; i < paths_length; i++) {
    jv path = jv_array_get(jv_copy(paths), i);
    jv subtree = tree;
    int path_length = jv_array_length(jv_copy(path));
    for (int j = 0; j < path_length; j++) {
      jv step = jv_array_get(jv_copy(path), j);
      if (jv_is_number(step)) {
        int index = jv_number_value(step);
        if (index < 0) {
          int subtree_length = jv_array_length(jv_copy(subtree));
          index = subtree_length + index;
          if (index < 0) {
            // Invalid index, handle error
            jv_free(paths);
            return jv_invalid_with_msg(jv_string("Invalid array index"));
          }
          path = jv_array_set(path, j, jv_number(index));
        }
        subtree = jv_array_get(jv_copy(subtree), index);
      } else if (jv_is_string(step)) {
        subtree = jv_object_get(jv_copy(subtree), step);
      } else {
        // Invalid path step, handle error
        jv_free(paths);
        return jv_invalid_with_msg(jv_string("Invalid path step"));
      }
    }
    paths = jv_array_set(paths, i, path);
  }
  return paths;
}

Copy link
Member

@emanuele6 emanuele6 Feb 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With canonicalising a path, we mean converting:

["foo",{"start":10,"end":null},{"start":-200,"end":-2},20]
["foo",-20,"hello"]

To (if foo is an array with 120 elements)

["foo",30]
["foo",100,"hello"]

You need to know the input to do that; if foo were an array with 300 elements you would want this instead:

["foo",120]
["foo",280,"hello"]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's a bit messy. Because path elements could overlap because of the slice specifiers, we might have to explode {"start":...,"end":...} into the individual indices that match those slice specifiers.

@capak07 the context here is that internally a "slice" specifier like $something[start:end] is represented as an object with a "start" and "end" keys, and that the start and end indices can be negative to mean "starting from the end of this string or array". Here jv_sort() just can't possibly do the right thing. We should either have a jv_sort_paths() that does, or we should normalize the paths to be given to jv_sort(), and the latter seems like the better path forward for now because the former will likely require that we use qsort_r() (which isn't portable, so we might have to import one from a BSD).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should not reject object indices (they are slices), and del(.[100]) on a short array should not be an error, so you should not start throwing an invalid path error

capak07 marked this conversation as resolved.
Show resolved Hide resolved
for(int i = 0; i < paths_length; i++) {
jv elem = jv_array_get(jv_copy(paths), i);
if (jv_get_kind(elem) != JV_KIND_ARRAY) {
jv_free(object);
jv_free(paths);
Expand All @@ -512,7 +514,7 @@ jv jv_delpaths(jv object, jv paths) {
}
jv_free(elem);
}
if (jv_array_length(jv_copy(paths)) == 0) {
if (paths_length == 0) {
// nothing is being deleted
jv_free(paths);
return object;
Expand All @@ -522,7 +524,7 @@ jv jv_delpaths(jv object, jv paths) {
jv_free(paths);
jv_free(object);
return jv_null();
}
}
return delpaths_sorted(object, paths, 0);
}

Expand Down
4 changes: 4 additions & 0 deletions tests/jq.test
Original file line number Diff line number Diff line change
Expand Up @@ -1023,6 +1023,10 @@ try delpaths(0) catch .
{}
"Paths must be specified as an array"

map(delpaths([[6], [7], [8], [9], [10]]))
capak07 marked this conversation as resolved.
Show resolved Hide resolved
[[0, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5]]
[[0, 1, 2, 3, 4, 5]]

del(.), del(empty), del((.foo,.bar,.baz) | .[2,3,0]), del(.foo[0], .bar[0], .foo, .baz.bar[0].x)
{"foo": [0,1,2,3,4], "bar": [0,1]}
null
Expand Down