Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: add List, Dict, Tuple and NamedTuple to the GenericDType bound #1556

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

sam-goodwin
Copy link
Contributor

Closes #1555

Comment on lines 62 to 65
List[Any],
Dict[Any, Any],
Tuple[Any],
NamedTuple,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, I can't make the type parameters more precise?

Copy link
Contributor Author

@sam-goodwin sam-goodwin Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cosmicBboy I think this change is ready to go, LMK if you see any issue with leaving these types wide.

Outstanding question is whether NamedTuple belongs there though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you spot-check this with mypy?

NamedTuple should be fine here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think I've patched the MyPy problems:

> pre-commit run mypy --all-files
mypy.....................................................................Passed
- hook id: mypy
- duration: 2.27s

Success: no issues found in 180 source files

Signed-off-by: Sam Goodwin <[email protected]>
Signed-off-by: Sam Goodwin <[email protected]>
Copy link

codecov bot commented Apr 4, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 83.07%. Comparing base (4df61da) to head (8ab65a5).
Report is 75 commits behind head on main.

❗ Current head 8ab65a5 differs from pull request most recent head 9dc8ed5. Consider uploading reports for the commit 9dc8ed5 to get more accurate results

Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1556       +/-   ##
===========================================
- Coverage   94.29%   83.07%   -11.22%     
===========================================
  Files          91      111       +20     
  Lines        7024     8191     +1167     
===========================================
+ Hits         6623     6805      +182     
- Misses        401     1386      +985     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@cosmicBboy
Copy link
Collaborator

cosmicBboy commented Apr 10, 2024

Thanks @sam-goodwin, see https://pandera.readthedocs.io/en/latest/CONTRIBUTING.html#set-up-pre-commit for steps to make sure linters and unit tests are passing. You'll also need to sign your commits: https://pandera.readthedocs.io/en/latest/CONTRIBUTING.html#dco-signing-commits

@cosmicBboy
Copy link
Collaborator

Mypy errors:

tests/core/test_typing.py:498: error: "list" is not subscriptable, use "typing.List" instead  [misc]
tests/core/test_typing.py:499: error: "dict" is not subscriptable, use "typing.Dict" instead  [misc]
tests/core/test_typing.py:500: error: "tuple" is not subscriptable, use "typing.Tuple" instead  [misc]

Note that pandera needs to support python 3.8 as well, so we need to use the generic types in the typing module.

Failing unit test:

FAILED tests/core/test_typing.py::test_complex_python_collection_types - pandera.errors.SchemaError: expected series 'list' to have type list[pandera.dtypes.Int32]:
failure cases:
   index failure_case
0      0       [1, 2]
1      1    [3, 4, 5]

Looks like you need to use the built-in int type? pandera.dtypes.Int32 translates to the numpy dtype for pandas columns.

@sam-goodwin
Copy link
Contributor Author

Looks like you need to use the built-in int type? pandera.dtypes.Int32 translates to the numpy dtype for pandas columns.

Do you mean we can't specify ints with specific precision in a List or Dict in pandera?

@cosmicBboy
Copy link
Collaborator

cosmicBboy commented Apr 24, 2024

Do you mean we can't specify ints with specific precision in a List or Dict in pandera?

This just follows the way pandas deals with data. Columns containing list or dict objects are just python objects, meaning they're not numpy arrays. This might be different for pyarrow data representations, but that'll be something to tackle when adding pyarrow support #1262.

In summary, pandera.dtypes.Int32 maps onto a numpy.int32, and a list[numpy.int32] isn't meaningful in the context of pandas. list[int] does tho, and will contain just lists of python ints.

@cosmicBboy
Copy link
Collaborator

@sam-goodwin friendly ping: one of the unit tests is still failing: https://github.com/unionai-oss/pandera/actions/runs/8861081819/job/24332580434?pr=1556

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pandera Series generic argument does not allow the typing.List etc. types
2 participants