Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compliance of a sort optimization #2307

Open
jmdyck opened this issue Feb 13, 2021 · 12 comments
Open

compliance of a sort optimization #2307

jmdyck opened this issue Feb 13, 2021 · 12 comments
Labels

Comments

@jmdyck
Copy link
Collaborator

jmdyck commented Feb 13, 2021

In Array.p.sort, if _comparefn_ is *undefined*, items are compared by comparing their ToString results. Rather than calling ToString twice per call to SortCompare, it seems like an implementation might want to call ToString on each item exactly once, sort the resulting strings, and then arrange the original items accordingly (aka the decorate-sort-undecorate pattern).

However, calls to ToString can be observed, and this optimization would definitely change the number of those calls (from roughly 2 n log(n) to just n). So does that mean that this optimization is non-compliant? (If so, do implementations use it anyway?)

@ljharb
Copy link
Member

ljharb commented Feb 13, 2021

how would "accordingly" work if two of the strings are the same, but the underlying toStringable objects are distinct?

@bakkot
Copy link
Contributor

bakkot commented Feb 13, 2021

So does that mean that this optimization is non-compliant?

That's my understanding, yes.

If so, do implementations use it anyway?

Empirically, no:

$ eshost -se "[{ toString(){ print(1); return '1'; } }, { toString(){ print(2); return '2'; } }, { toString(){ print(3); return '3'; } }, { toString(){ print(4); return '4'; } }].sort()"
#### Chakra, JavaScriptCore, SpiderMonkey
1
2
3
4
1
2
3
4
1,2,3,4

#### GraalJS, Hermes, V8
2
1
3
2
4
3
1
2
3
4
1,2,3,4

#### XS
1
2
2
3
3
4
1
2
3
4
1,2,3,4

@jmdyck
Copy link
Collaborator Author

jmdyck commented Feb 13, 2021

how would "accordingly" work if two of the strings are the same, but the underlying toStringable objects are distinct?

If two strings are the same, then the underlying objects are in the same equivalence class, so they sort "together", but ordered as in the original input, to satisfy the stability requirement. So an object's 'decoration' would actually be, not just its ToString result, but a pair consisting of that string and also the object's position in the input array.

@jmdyck
Copy link
Collaborator Author

jmdyck commented Feb 13, 2021

@bakkot:
Thanks for running that test. I'm guessing that, in each case, the last 4 calls to toString are eshost figuring out how to print the resulting array. So if we omit those, we'll see just the toString calls involved in the sorting:

#### Chakra, JavaScriptCore, SpiderMonkey
1
2
3
4

#### GraalJS, Hermes, V8
2
1
3
2
4
3

#### XS
1
2
2
3
3
4

So it looks to me like Chakra, JavaScriptCore, and SpiderMonkey are using the strategy of calling ToString exactly once on each item, whereas the others exhibit behavior that's consistent with the spec (calling SortCompare 3 times with 3 distinct pairs of items).

@bakkot
Copy link
Contributor

bakkot commented Feb 13, 2021

I'm guessing that, in each case, the last 4 calls to toString are eshost figuring out how to print the resulting array

Hah, yes, you are of course correct. Using -x instead of -e so it doesn't print the result:

$ eshost -sx "[{ toString(){ print(1); return '1'; } }, { toString(){ print(2); return '2'; } }, { toString(){ print(3); return '3'; } }, { toString(){ print(4); return '4'; } }].sort()"
#### Chakra, JavaScriptCore, SpiderMonkey
1
2
3
4

#### GraalJS, Hermes, V8
2
1
3
2
4
3

#### XS
1
2
2
3
3
4

@kmiller68
Copy link
Contributor

So does that mean that this optimization is non-compliant?

That's my understanding, yes.

Hmm, it seems to me that it would be compliant but I may be biased... The number of calls to SortCompare is implementation defined and in order for the resulting array to not be implementation defined toString has to return a consistent result. Thus it seems to me that it's valid for an engine to cache the result of a each toString call as it can't change.

The main way I could see a problem is if there was an odd number of items in the array and nothing was toStringed twice. Although, I guess the engine could compare with list[size(list)], which I assume is undefined?

@waldemarhorwat
Copy link

Nice catch!

Some browsers cache the results (which is noncompliant with the current wording of the spec), some don't.

[{toString(){document.write(3); return '3';}}, {toString(){document.write(1); return '1';}}, {toString(){document.write(2); return '2';}}].sort(), "done"

Firefox caches and produces 312.
Chrome doesn't seem to cache and produces 13212321.

@lightmare
Copy link

Some browsers cache the results (which is noncompliant with the current wording of the spec), some don't.

I believe such caching is compliant. Quoting from the spec:

  1. Sort items using an implementation-defined sequence of calls to SortCompare.
    ...

The sort order is implementation-defined if any of the following conditions is true:
...

  • If comparefn is undefined and all applications of ToString, to any specific value passed as an argument to SortCompare, do not produce the same result.

It doesn't say the arguments passed to SortCompare have to be the actual elements of items. You can memoize ToString calls inside SortCompare, and pass the strings to subsequent comparisons.

As for pre-stringifying the whole list in advance, that still seems compliant because the implementation is allowed to call SortCompare any number of times. It could theoretically start by calling SortCompare with disjunct pairs of not-undefined elements — if there's only one, you're done sorting; if there's an odd number of those, you can compare the last one with any previously memoized string. In the end, you have called ToString once per element, and now you can start actually sorting the strings.

@devsnek
Copy link
Member

devsnek commented May 30, 2021

technically no calls at all is also an implementation defined sequence of calls

@jmdyck
Copy link
Collaborator Author

jmdyck commented May 31, 2021

It doesn't say the arguments passed to SortCompare have to be the actual elements of items.

That's true, it doesn't. I suspect that's an oversight, as older versions of the spec were explicit about the arguments in calls to SortCompare. Anyway, I don't think any other interpretation makes sense, given that:

  • a consistent comparison function for the elements of items is defined in terms of calls where the arguments are elements of items; and
  • the post-condition on the sorted array is expressed in terms of calls to SortCompare where the arguments are elements of items.

You can memoize ToString calls inside SortCompare, and pass the strings to subsequent comparisons.

An implementation could do that. That doesn't make it compliant.

As for pre-stringifying the whole list in advance, that still seems compliant because the implementation is allowed to call SortCompare any number of times.

Sure, but each call to SortCompare is specified to make 0 or 2 calls to ToString. If an implementation is observed to make an odd number of calls to ToString, then that would appear to be non-compliant behavior.

It could theoretically start by calling SortCompare with disjunct pairs of not-undefined elements — if there's only one, you're done sorting; if there's an odd number of those, you can compare the last one with any previously memoized string.

That would result in a call to ToString on the memoized string, which, if the corresponding element of items wasn't a String, would be observably different from calling ToString on that element. Instead, you might want to suggest comparing 'the last one' with any other not-undefined element.

So yes, a caching implementation could ensure that it always exhibited an even number of calls to ToString. But it doesn't look like they bother to do so.

@lightmare
Copy link

Anyway, I don't think any other interpretation makes sense, given that:

* a consistent comparison function for the elements of `items` is defined in terms of calls where the arguments are elements of `items`; and

* the post-condition on the sorted array is expressed in terms of calls to SortCompare where the arguments are elements of `items`.

These are expressed in terms of unrealized calls to SortCompare, i.e. "if you were to call SortCompare(a, b) for every possible combination of elements, this is what it should return". An implementation is certainly not required to be O(n³) just to check the CCF condition. Given:

  • comparefn === undefined
  • aCachedString = (a === undefined ? undefined : ToString(a))
  • bCachedString = (b === undefined ? undefined : ToString(b))

then:

  • SortCompare(a, b) and
  • SortCompare(aCachedString, bCachedString)

are only distinguishable by observing the ToString call.
For the purposes of pre- and post-conditions on unrealized calls, they are indistinguishable.

@jmdyck
Copy link
Collaborator Author

jmdyck commented May 31, 2021

These are expressed in terms of unrealized calls to SortCompare,

Agreed, which is why I didn't present them as refuting your claim (that SortCompare's args aren't required to be elements of items), but merely indicating that it was probably inconsistent with the intent.

But yes, if SortCompare's args aren't required to be elements of items, then (even if all elements are objects with observable ToString) it would be possible for a call to SortCompare to produce exactly one observable call to ToString, so A.p.sort() wouldn't require an even number of observable calls to ToString.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants