compliance of a sort optimization #2307

jmdyck · 2021-02-13T05:05:32Z

In Array.p.sort, if _comparefn_ is *undefined*, items are compared by comparing their ToString results. Rather than calling ToString twice per call to SortCompare, it seems like an implementation might want to call ToString on each item exactly once, sort the resulting strings, and then arrange the original items accordingly (aka the decorate-sort-undecorate pattern).

However, calls to ToString can be observed, and this optimization would definitely change the number of those calls (from roughly 2 n log(n) to just n). So does that mean that this optimization is non-compliant? (If so, do implementations use it anyway?)

The text was updated successfully, but these errors were encountered:

ljharb · 2021-02-13T05:12:29Z

how would "accordingly" work if two of the strings are the same, but the underlying toStringable objects are distinct?

bakkot · 2021-02-13T05:50:39Z

So does that mean that this optimization is non-compliant?

That's my understanding, yes.

If so, do implementations use it anyway?

Empirically, no:

$ eshost -se "[{ toString(){ print(1); return '1'; } }, { toString(){ print(2); return '2'; } }, { toString(){ print(3); return '3'; } }, { toString(){ print(4); return '4'; } }].sort()"
#### Chakra, JavaScriptCore, SpiderMonkey
1
2
3
4
1
2
3
4
1,2,3,4

#### GraalJS, Hermes, V8
2
1
3
2
4
3
1
2
3
4
1,2,3,4

#### XS
1
2
2
3
3
4
1
2
3
4
1,2,3,4

jmdyck · 2021-02-13T14:25:54Z

how would "accordingly" work if two of the strings are the same, but the underlying toStringable objects are distinct?

If two strings are the same, then the underlying objects are in the same equivalence class, so they sort "together", but ordered as in the original input, to satisfy the stability requirement. So an object's 'decoration' would actually be, not just its ToString result, but a pair consisting of that string and also the object's position in the input array.

jmdyck · 2021-02-13T14:47:57Z

@bakkot:
Thanks for running that test. I'm guessing that, in each case, the last 4 calls to toString are eshost figuring out how to print the resulting array. So if we omit those, we'll see just the toString calls involved in the sorting:

#### Chakra, JavaScriptCore, SpiderMonkey
1
2
3
4

#### GraalJS, Hermes, V8
2
1
3
2
4
3

#### XS
1
2
2
3
3
4

So it looks to me like Chakra, JavaScriptCore, and SpiderMonkey are using the strategy of calling ToString exactly once on each item, whereas the others exhibit behavior that's consistent with the spec (calling SortCompare 3 times with 3 distinct pairs of items).

bakkot · 2021-02-13T15:48:30Z

I'm guessing that, in each case, the last 4 calls to toString are eshost figuring out how to print the resulting array

Hah, yes, you are of course correct. Using -x instead of -e so it doesn't print the result:

$ eshost -sx "[{ toString(){ print(1); return '1'; } }, { toString(){ print(2); return '2'; } }, { toString(){ print(3); return '3'; } }, { toString(){ print(4); return '4'; } }].sort()"
#### Chakra, JavaScriptCore, SpiderMonkey
1
2
3
4

#### GraalJS, Hermes, V8
2
1
3
2
4
3

#### XS
1
2
2
3
3
4

kmiller68 · 2021-02-15T05:32:53Z

So does that mean that this optimization is non-compliant?

That's my understanding, yes.

Hmm, it seems to me that it would be compliant but I may be biased... The number of calls to SortCompare is implementation defined and in order for the resulting array to not be implementation defined toString has to return a consistent result. Thus it seems to me that it's valid for an engine to cache the result of a each toString call as it can't change.

The main way I could see a problem is if there was an odd number of items in the array and nothing was toStringed twice. Although, I guess the engine could compare with list[size(list)], which I assume is undefined?

waldemarhorwat · 2021-03-06T00:53:45Z

Nice catch!

Some browsers cache the results (which is noncompliant with the current wording of the spec), some don't.

[{toString(){document.write(3); return '3';}}, {toString(){document.write(1); return '1';}}, {toString(){document.write(2); return '2';}}].sort(), "done"

Firefox caches and produces 312.
Chrome doesn't seem to cache and produces 13212321.

lightmare · 2021-05-30T21:16:58Z

Some browsers cache the results (which is noncompliant with the current wording of the spec), some don't.

I believe such caching is compliant. Quoting from the spec:

Sort items using an implementation-defined sequence of calls to SortCompare.
...

The sort order is implementation-defined if any of the following conditions is true:
...

If comparefn is undefined and all applications of ToString, to any specific value passed as an argument to SortCompare, do not produce the same result.

It doesn't say the arguments passed to SortCompare have to be the actual elements of items. You can memoize ToString calls inside SortCompare, and pass the strings to subsequent comparisons.

As for pre-stringifying the whole list in advance, that still seems compliant because the implementation is allowed to call SortCompare any number of times. It could theoretically start by calling SortCompare with disjunct pairs of not-undefined elements — if there's only one, you're done sorting; if there's an odd number of those, you can compare the last one with any previously memoized string. In the end, you have called ToString once per element, and now you can start actually sorting the strings.

devsnek · 2021-05-30T21:22:28Z

technically no calls at all is also an implementation defined sequence of calls

jmdyck · 2021-05-31T02:01:20Z

It doesn't say the arguments passed to SortCompare have to be the actual elements of items.

That's true, it doesn't. I suspect that's an oversight, as older versions of the spec were explicit about the arguments in calls to SortCompare. Anyway, I don't think any other interpretation makes sense, given that:

a consistent comparison function for the elements of items is defined in terms of calls where the arguments are elements of items; and
the post-condition on the sorted array is expressed in terms of calls to SortCompare where the arguments are elements of items.

You can memoize ToString calls inside SortCompare, and pass the strings to subsequent comparisons.

An implementation could do that. That doesn't make it compliant.

As for pre-stringifying the whole list in advance, that still seems compliant because the implementation is allowed to call SortCompare any number of times.

Sure, but each call to SortCompare is specified to make 0 or 2 calls to ToString. If an implementation is observed to make an odd number of calls to ToString, then that would appear to be non-compliant behavior.

It could theoretically start by calling SortCompare with disjunct pairs of not-undefined elements — if there's only one, you're done sorting; if there's an odd number of those, you can compare the last one with any previously memoized string.

That would result in a call to ToString on the memoized string, which, if the corresponding element of items wasn't a String, would be observably different from calling ToString on that element. Instead, you might want to suggest comparing 'the last one' with any other not-undefined element.

So yes, a caching implementation could ensure that it always exhibited an even number of calls to ToString. But it doesn't look like they bother to do so.

lightmare · 2021-05-31T08:36:27Z

Anyway, I don't think any other interpretation makes sense, given that:

* a consistent comparison function for the elements of `items` is defined in terms of calls where the arguments are elements of `items`; and

* the post-condition on the sorted array is expressed in terms of calls to SortCompare where the arguments are elements of `items`.

These are expressed in terms of unrealized calls to SortCompare, i.e. "if you were to call SortCompare(a, b) for every possible combination of elements, this is what it should return". An implementation is certainly not required to be O(n³) just to check the CCF condition. Given:

comparefn === undefined
aCachedString = (a === undefined ? undefined : ToString(a))
bCachedString = (b === undefined ? undefined : ToString(b))

then:

SortCompare(a, b) and
SortCompare(aCachedString, bCachedString)

are only distinguishable by observing the ToString call.
For the purposes of pre- and post-conditions on unrealized calls, they are indistinguishable.

jmdyck · 2021-05-31T15:55:29Z

These are expressed in terms of unrealized calls to SortCompare,

Agreed, which is why I didn't present them as refuting your claim (that SortCompare's args aren't required to be elements of items), but merely indicating that it was probably inconsistent with the intent.

But yes, if SortCompare's args aren't required to be elements of items, then (even if all elements are objects with observable ToString) it would be possible for a call to SortCompare to produce exactly one observable call to ToString, so A.p.sort() wouldn't require an even number of observable calls to ToString.

jmdyck added the question label Feb 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compliance of a sort optimization #2307

compliance of a sort optimization #2307

jmdyck commented Feb 13, 2021

ljharb commented Feb 13, 2021

bakkot commented Feb 13, 2021

jmdyck commented Feb 13, 2021

jmdyck commented Feb 13, 2021

bakkot commented Feb 13, 2021

kmiller68 commented Feb 15, 2021

waldemarhorwat commented Mar 6, 2021

lightmare commented May 30, 2021

devsnek commented May 30, 2021

jmdyck commented May 31, 2021

lightmare commented May 31, 2021

jmdyck commented May 31, 2021

compliance of a sort optimization #2307

compliance of a sort optimization #2307

Comments

jmdyck commented Feb 13, 2021

ljharb commented Feb 13, 2021

bakkot commented Feb 13, 2021

jmdyck commented Feb 13, 2021

jmdyck commented Feb 13, 2021

bakkot commented Feb 13, 2021

kmiller68 commented Feb 15, 2021

waldemarhorwat commented Mar 6, 2021

lightmare commented May 30, 2021

devsnek commented May 30, 2021

jmdyck commented May 31, 2021

lightmare commented May 31, 2021

jmdyck commented May 31, 2021