Improved handling of enum columns in pk #412
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
https://jira.percona.com/browse/PT-1591
https://jira.percona.com/browse/PT-1572
These tickets describe a problem with the underlying NibbleIterator in the way it treats enum values in keys. In short, e.g., c enum('a', 'z', 'b'), the issue is that MySQL ORDER BY clause orders by enum index value ('a', 'z', 'b') and the NibbleIterator specifies boundaries as enum name value strings, e.g., WHERE c <= 'b', which are treated as alphanumeric comparisons-- not enum index value comparisons. In MySQL, this returns only 'a' because 'a' is alphanumerically <= 'b', but the ORDER BY returns 'z' rows before 'b' rows because ORDER BY is sorting by enum index values; thus, 'z' rows may be missed in the nibble. The solution provided in the tickets quoted above changes the ORDER BY clause to force conversion of the enum to a string, which in turn forces creation of a new sort index each nibble. On large tables, this is unreasonably slow (it would have taken 32 years to convert our table of 1.7billion rows).
The solution in this patch assures all references to enum columns are referenced by their enum index value. In MySQL, this is done by casting an enum column to an UNSIGNED; when an enum value is cast to an UNSIGNED, its enum index value is used. This allows the ORDER BY clause to naturally sort by the PK sort order using the enum index value, and also allows comparisons to operate correctly on enum index values, since they are comparing the index values as UNSIGNEDs instead of performing an alphanumeric comparison on the enum name values. This patch removes all references to enum columns by their name values. This enables the natural order of the key to be used in the ORDER BY, removing the forced conversion of enum values to strings, which in turn removes the need for MySQL to create a sort index for each nibble. The practical result of this patch reduces the duration for a schema update of our tables from years to hours, and I hope is a more desired fix for the issue recorded in the tickets mentioned above. In summary: reference enums by enum index values, always.