Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-51322][SQL] Better error message for streaming subquery expression #50088

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

Today, if a user creates a subquery expression with a streaming query (using the new DataFrame subquery API, or manipulating logical plans directly), he/she will hit weird errors:

  • for uncorrelated subquery expressions, Spark will invoke the batch planner to plan the subquery and hit an error because streaming plans are not recognized by the batch planner.
  • for correlated subquery expressions in a batch query, it will be rewritten to joins, but the outer batch query is passed to the batch planner and we hit the same error as the previous one
  • for correlated subquery expressions in a streaming query, the streaming execution does not go into subqueries to replace StreamingRelationV2 with StreamingDataSourceV2ScanRelation, and after subquery rewriting, the StreamingRelationV2 will remain and make Spark fail at runtime by StreamingRelationExec cannot be executed.

This PR proposes to check streaming subquery expression and fail earlier.

Why are the changes needed?

better error message

Does this PR introduce any user-facing change?

Yes, but it's only error message change

How was this patch tested?

new test

Was this patch authored or co-authored using generative AI tooling?

no

@cloud-fan
Copy link
Contributor Author

cc @HeartSaVioR @viirya

@github-actions github-actions bot added the SQL label Feb 26, 2025
@viirya
Copy link
Member

viirya commented Feb 26, 2025

KafkaSourceStressForDontFailOnDataLossSuite failed but looks like unrelated.

[info]   Cause: java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.UnknownTopicOrPartitionException: Failed to fetch metadata for partition failOnDataLoss-9-0 because metadata for topic `failOnDataLoss-9` could not be found

@the-sakthi
Copy link
Member

LGTM

Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants