Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved handling of enum columns in pk #412

Open
wants to merge 1 commit into
base: 3.x
Choose a base branch
from

Conversation

tgriffitts-vs
Copy link

@tgriffitts-vs tgriffitts-vs commented Sep 1, 2019

https://jira.percona.com/browse/PT-1591
https://jira.percona.com/browse/PT-1572

These tickets describe a problem with the underlying NibbleIterator in the way it treats enum values in keys. In short, e.g., c enum('a', 'z', 'b'), the issue is that MySQL ORDER BY clause orders by enum index value ('a', 'z', 'b') and the NibbleIterator specifies boundaries as enum name value strings, e.g., WHERE c <= 'b', which are treated as alphanumeric comparisons-- not enum index value comparisons. In MySQL, this returns only 'a' because 'a' is alphanumerically <= 'b', but the ORDER BY returns 'z' rows before 'b' rows because ORDER BY is sorting by enum index values; thus, 'z' rows may be missed in the nibble. The solution provided in the tickets quoted above changes the ORDER BY clause to force conversion of the enum to a string, which in turn forces creation of a new sort index each nibble. On large tables, this is unreasonably slow (it would have taken 32 years to convert our table of 1.7billion rows).

The solution in this patch assures all references to enum columns are referenced by their enum index value. In MySQL, this is done by casting an enum column to an UNSIGNED; when an enum value is cast to an UNSIGNED, its enum index value is used. This allows the ORDER BY clause to naturally sort by the PK sort order using the enum index value, and also allows comparisons to operate correctly on enum index values, since they are comparing the index values as UNSIGNEDs instead of performing an alphanumeric comparison on the enum name values. This patch removes all references to enum columns by their name values. This enables the natural order of the key to be used in the ORDER BY, removing the forced conversion of enum values to strings, which in turn removes the need for MySQL to create a sort index for each nibble. The practical result of this patch reduces the duration for a schema update of our tables from years to hours, and I hope is a more desired fix for the issue recorded in the tickets mentioned above. In summary: reference enums by enum index values, always.

@it-percona
Copy link
Contributor

it-percona commented Sep 1, 2019

CLA assistant check
All committers have signed the CLA.

Copy link
Collaborator

@svetasmirnova svetasmirnova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution.

You modified the auto-generated code. It will be removed once we run the update-modules utility. Please put your changes into the TableNibbler package instead.

@percona-csalguero
Copy link
Contributor

Hi,
As Sveta mentioned, these changes should be applied on NibbleIterator.pm and then they must be applied to all programs using that package with something like:

for t in bin/*; do util/update-modules $t NibbleIterator; done

@percona-csalguero
Copy link
Contributor

Please also:

  1. Fix merge conflicts
  2. This should be merged into the 3.x branch, not in 3.0

@percona-csalguero
Copy link
Contributor

@svetasmirnova Since this issue seems to be abandoned, can we close it?

@svetasmirnova
Copy link
Collaborator

@percona-csalguero let me review if changes are safe first. We can probably adjust the patch ourselves if it can help us to improve speed of handling ENUM indexes without loosing data.

@svetasmirnova svetasmirnova changed the base branch from 3.0 to 3.x November 7, 2023 13:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants