Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(spark)!: Transpile ANY to EXISTS #4305

Merged
merged 2 commits into from
Oct 29, 2024
Merged

feat(spark)!: Transpile ANY to EXISTS #4305

merged 2 commits into from
Oct 29, 2024

Conversation

VaggelisD
Copy link
Collaborator

@VaggelisD VaggelisD commented Oct 29, 2024

Fixes #4298

In Hive hierarchy ANY is an aggregate function for BOOLEAN expressions and not an array/subquery operator as is in other dialects, meaning that the following transpiled queries do not work:

>>> sqlglot.parse_one("WITH t AS (SELECT ARRAY[1, 2, 3] AS col) SELECT * FROM t WHERE 1 <= ANY(col)", dialect="postgres").sql("spark")
'WITH t AS (SELECT ARRAY(1, 2, 3) AS col) SELECT * FROM t WHERE 1 <= ANY(col)'

spark-sql (default)> WITH t AS (SELECT ARRAY(1, 2, 3) AS col) SELECT * FROM t WHERE 1 <= ANY(col);
[DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "any(col)" due to data type mismatch: Parameter 1 requires the "BOOLEAN" type, however "col" has the type "ARRAY<INT>".; line 1 pos 68;
...

However, Spark2+ supports the EXISTS() function which can process both ARRAY expressions and subqueries in a similar fashion. This PR:

  1. Adds a transformation for the former (ARRAY) case, enabling the following path:
>>> sqlglot.parse_one("WITH t AS (SELECT ARRAY[1, 2, 3] AS col) SELECT * FROM t WHERE 1 <= ANY(col)", dialect="postgres").sql("spark")
'WITH t AS (SELECT ARRAY(1, 2, 3) AS col) SELECT * FROM t WHERE EXISTS(col, x -> 1 <= x)'

spark-sql (default)> WITH t AS (SELECT ARRAY(1, 2, 3) AS col) SELECT * FROM t WHERE EXISTS(col, x -> 1 <= x);
[1,2,3]
  1. Extends EXISTS class to also inherit from Func, making it possible to parse & construct it as a proper function.

As step (2) is not required per se (could also build an anonymous function, check first commit), it's added as a standalone commit on top of (1); If we don't want to keep it it's trivial to drop it at once.

Docs

Postgres ANY for ARRAY | Postgres ANY for subqueries | Spark/Databricks ANY | Spark/Databricks EXISTS

@VaggelisD VaggelisD changed the title feat(spark): Transpile ANY to EXISTS feat(spark)!: Transpile ANY to EXISTS Oct 29, 2024
@VaggelisD VaggelisD merged commit e92904e into main Oct 29, 2024
6 checks passed
@VaggelisD VaggelisD deleted the vaggelisd/spark_any branch October 29, 2024 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug In Parsing ANY function from postgres to spark
3 participants