Generate shuffles compatible with existing partitioning order #566

senderista · 2017-07-20T20:53:35Z

Since #565 was merged, reordering join conditions will no longer produce an incorrect query plan, but it could force an unnecessary shuffle (which happens in the referenced integration test). When one side of a join is already partitioned on the join attributes, and the other input is not partitioned, the generated shuffle for the unpartitioned input should ensure its partitioning order is compatible with the already-partitioned input. If we allow the order of join conditions to determine the order of partitioning attributes (which we do now), then this will generally not be the case, and we will generate an unnecessary shuffle for the already-partitioned input. Since the ordering of partitioning attributes has no user-visible significance, there is no reason to let it be determined by the order of conditions in a query.

senderista · 2017-07-20T21:05:47Z

The fix for this should be simple to test without needing to extend FakeDB to include physical representation, just by generating the query plan for the second query in uwescience/myria#905 and verifying it does not contain a shuffle above the already-partitioned table's scan.

senderista · 2017-07-21T05:02:00Z

Perhaps we could entirely avoid the need for ensuring compatible ordering by changing our hashing method to be independent of attribute order. One simple approach would be to separately hash each attribute, then XOR the results. This would be incompatible with existing hash-partitioned relations, of course (if updated code were applied to an existing cluster). This approach seems much simpler than trying to ensure a canonical ordering of attributes (which is impossible in general since different relations may not share the names of corresponding attributes in their join keys).

senderista · 2017-07-23T14:58:30Z

Actually XOR is probably not a great choice since any key with each of its values repeated an even number of times would hash to 0. We only need the hash function to be commutative (not an involution like XOR), so something like sum of all hashes modulo a large prime should work better.

senderista added bug enhancement labels Jul 20, 2017

senderista self-assigned this Jul 20, 2017

This was referenced Jul 20, 2017

force ordering of partitioning attributes #565

Merged

Order ignored in partition attributes #515

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate shuffles compatible with existing partitioning order #566

Generate shuffles compatible with existing partitioning order #566

senderista commented Jul 20, 2017

senderista commented Jul 20, 2017

senderista commented Jul 21, 2017

senderista commented Jul 23, 2017

Generate shuffles compatible with existing partitioning order #566

Generate shuffles compatible with existing partitioning order #566

Comments

senderista commented Jul 20, 2017

senderista commented Jul 20, 2017

senderista commented Jul 21, 2017

senderista commented Jul 23, 2017