Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify from_json and parse_tabular implementations #545

Open
dtulga opened this issue Oct 28, 2024 · 4 comments
Open

Unify from_json and parse_tabular implementations #545

dtulga opened this issue Oct 28, 2024 · 4 comments
Labels
enhancement New feature or request housekeeping

Comments

@dtulga
Copy link
Contributor

dtulga commented Oct 28, 2024

This issue is to unify the existing from_json and from_jsonl implementations with the existing implementations in parse_tabular, from_csv, and from_parquet. This is to consolidate dynamic model generation and schema inference for these import functions. Current functionality (such as jmespath support) should be preserved, so the implementations likely cannot be identical between these import functions, but they should use similar dynamic model generation, schema inference, etc. and this should also ideally remove the dependency on datamodel-code-generator if possible.

@dtulga dtulga self-assigned this Oct 28, 2024
@dtulga dtulga added enhancement New feature or request housekeeping labels Oct 28, 2024
@dtulga
Copy link
Contributor Author

dtulga commented Oct 29, 2024

This article may be helpful in the future, as it talks about pyarrow's support for JSON: https://arrow.apache.org/docs/python/generated/pyarrow.json.read_json.html

@dtulga dtulga removed their assignment Oct 29, 2024
@shcheklein
Copy link
Member

thanks @dtulga !

@PanGan21
Copy link

PanGan21 commented Nov 7, 2024

Hi, I am wondering if this issue is open for contribution under some guidance 🙂

@shcheklein
Copy link
Member

@PanGan21 hi, yes, absolutely. Please take a look in the parse_tabular and from_json implementations, especially the part where it depends on the datamodel-code-generator - that's is hackiest part that we would like to get rid of. Let us know if something is not clear. It can not the simplest task tbh but can be an interesting one!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request housekeeping
Projects
None yet
Development

No branches or pull requests

3 participants