Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the error caused by non-decodable bytes when retrieving the non-empty domain for TILEDB_ASCII dimensions #2164

Merged
merged 2 commits into from
Feb 20, 2025

Conversation

kounelisagis
Copy link
Member

@kounelisagis kounelisagis commented Feb 20, 2025

As discovered in TileDB-ML, calling the Array::_non_empty_domain pybind11 function on a TILEDB_ASCII dimension containing non-decodable bytes (in terms of UTF-8) resulted in errors such as:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 0: invalid start byte.
The pybind11::make_tuple function assumed that the input would always be a printable string, requiring special handling for this case.
This PR addresses the issue and also ensures that an exception is thrown for dimensions of unicode string type, as this is not supported until TileDB 2.27.0.


[sc-63472]

Copy link
Member

@ihnorton ihnorton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after double-checking the return type matches 0.32

@ktsitsi
Copy link

ktsitsi commented Feb 20, 2025

Confirming changes fix the issue in TileDB-ML.

@kounelisagis
Copy link
Member Author

LGTM after double-checking the return type matches 0.32

Both the structure of the returned value from Array::_non_empty_domain (a tuple of tuples - one for each dimension) and the type of each internal tuple (bytes, in this case) remain the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants