-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Allow companion functions when result type is not resolvable given intermediate type #11999
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for meta-velox canceled.
|
cee024d
to
9a4519f
Compare
@rui-mo Thank you for taking this on. I remember we needed to bypass this check as well to make the decimal aggregate functions work. Or we don't need that change anymore? |
9a4519f
to
9911e8f
Compare
@zhztheplayer Thanks for the pointer! In Gluten we expect that all the names will end with "_merge_extract," however according to this reasoning, the names may have other suffixes. For instance, we would obtain the names for sum and average as follows. I'll investigate further to see how to incorporate this section with Gluten.
|
Sounds reasonable. Look forward to a solution here. Thanks! |
9911e8f
to
3d2f9e5
Compare
if (auto func = getAggregateFunctionEntry(name)) { | ||
auto fn = func->factory( | ||
core::AggregationNode::Step::kFinal, | ||
argTypes, | ||
originalResultType, | ||
resultType, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @rui-mo, it's not universally correct to use resultType here. The reason is that resultType
is an argument received by the factory of the merge-extract-function (i.e., the lambda starting at line 337). This factory is called in the HashAggregation constructor for individual aggregation nodes that can be the partial aggregation step or the intermediate aggregation step, etc. Suppose an aggregation node perform the intermediate aggregation step of the merge-extract-function, then both the argTypes and the resultType received by the factory at line 337 are the intermediate type of the original function. But when we do auto fn = func->factory(...)
, we're creating the original aggregation function, so the result type passed to this factory should be the result type of the original aggregation function.
(This change doesn't trigger any test error because the AggregationTestBase::testAggregationsWithCompanion() currently doesn't test the functions with the merge-extract companion function, which we should better add...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suppose an aggregation node perform the intermediate aggregation step of the merge-extract-function, then both the argTypes and the resultType received by the factory at line 337 are the intermediate type of the original function.
Can we break down the cases here? E.g., when step is single
/ final
, we just skip the result type resolution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we break down the cases here? E.g., when step is single / final, we just skip the result type resolution?
@kagamiori I apologize for the delayed response; I was on vacation. Do you believe that @zhztheplayer's suggestion above makes sense? We are proposing this change to allow more flexibility in the aggregate function registration especially for the Spark decimal average.
@rui-mo Thank you for working on this. I'm developing a new Spark
For the first and second issues, perhaps we can simply remove the limitation and check? For the fourth issue, maybe we can use actual return type to resolve return type. |
The registrations of partial and merge companion functions does not require the
result type is resolvable given intermediate type. This PR removes the
limitations for them.The registration of merge_extract companion function used
to depend on the resolving, and this PR uses the passed-in result type rather
than the resolved from intermediate type.