Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE REQUEST] Support union field #12778

Open
gsanon opened this issue Feb 5, 2025 · 1 comment
Open

[FEATURE REQUEST] Support union field #12778

gsanon opened this issue Feb 5, 2025 · 1 comment

Comments

@gsanon
Copy link

gsanon commented Feb 5, 2025

Tips before filing an issue

  • Have you gone through our FAQs?

  • Join the mailing list to engage in conversations and get faster support at [email protected].

  • If you have triaged this as a bug, then file an issue directly.

Describe the problem you faced

Currently this is not possible to write Hudi data if the source DF contains a field with union type.

Let's say you have an avro schema with an union type field :

{
  "name": "field",
  "type":
  [
    "null",
    "int",
    "string",
    "boolean"
  ],
  "default": null
}

If you create a parquet file from this the field will be transformed into a struct with a memberX for each type of the union field<struct<member0:int, member1:string, member2:boolean>>

Then when writing data in Hudi, in 0.X, the process fails because it will take only the first type and then try to write the struct into the type selected, in our case you will get something like :
java.lang.IllegalArgumentException: StructType(StructField(member0,IntegerType,true),StructField(member1,StringType,true),StructField(member2,BooleanType,true)) and IntegerType are incompatible:

This behavior was a bit opaque in 0.x but in 1.0.0 this has been made pretty clear here

So each time we encounter parquet files containing fields with union type we need to pre-process the data as a inelegant workaround (renaming the memberX fields to avoid the union type detection)

Knowing that the parquet implementation allows this union type and Avro as well, we could expect Hudi to be able to handle it in one or other way (representing the member struct as it is ?). Wdyt ?

@ad1happy2go
Copy link
Collaborator

@gsanon As we see the same also in code, it was never supported. With 0.x also we are only using the first one but we didn't had validation check which was causing the issue.
I agree it can be supported, Feel free to contribute if you are interested. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Awaiting Triage
Development

No branches or pull requests

2 participants