Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-51299][SQL][UI] MetricUtils.stringValue should filter metric values with initValue rather than a hardcoded value #50055

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jiwen624
Copy link

@jiwen624 jiwen624 commented Feb 24, 2025

What changes were proposed in this pull request?

This PR proposes to use initValue of a metric in org.apache.spark.util.MetricUtils.stringValue instead of a hardcoded value 0 to filter out invalid metric values. A new field initValue is added to SQLMetricInfo and SQLPlanMetric so it can be passed into org.apache.spark.util.MetricUtils.stringValue to do the filtering accordingly.

Why are the changes needed?

In method org.apache.spark.util.MetricUtils.stringValue, it uses a hardcoded value 0 to filter out invalid metric values for SIZE_METRIC, TIMING_METRIC and NS_TIMING_METRIC:

val validValues = values.filter(_ >= 0)
However, in SQLMetrics it offers methods to create these types of metrics with initValue other than -1 (introduced in this PR #41555) :

def createSizeMetric(sc: SparkContext, name: String, initValue: Long = -1): SQLMetric = {
which means there is a chance that the metrics are created with a initValue != -1 and in this case the filter above will generate incorrect results.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing UTs

Was this patch authored or co-authored using generative AI tooling?

No.

@jiwen624 jiwen624 changed the title [WIP][SPARK-51299][SQL][Web UI] Filter out invalid metric values with initValue [WIP][SPARK-51299][SQL][UI] Filter out invalid metric values with initValue Feb 24, 2025
@jiwen624 jiwen624 changed the title [WIP][SPARK-51299][SQL][UI] Filter out invalid metric values with initValue [WIP][SPARK-51299][SQL][UI] MetricUtils.stringValue should filter metric values with initValue rather than a hardcoded value Feb 24, 2025
@jiwen624 jiwen624 changed the title [WIP][SPARK-51299][SQL][UI] MetricUtils.stringValue should filter metric values with initValue rather than a hardcoded value [SPARK-51299][SQL][UI] MetricUtils.stringValue should filter metric values with initValue rather than a hardcoded value Feb 24, 2025
@jiwen624
Copy link
Author

@dongjoon-hyun @cloud-fan could you let me know your thoughts about this fix? Thanks.

@cloud-fan
Copy link
Contributor

I'm not very convinced by the "Why" section. What's the end-to-end problem you hit?

BTW #47721 has some more context about the SQLMetric initial value.

@jiwen624
Copy link
Author

@cloud-fan Thanks a lot for the response and the context in #47721. It's very helpful.
From what I see in previous PRs and comments in the code, it seems that we treat initValue in SQLMetric as a invalid value which should be filtered out. For example, in this recent PR #44222, it mentions:

initValue is the starting value for a SQLMetric. If a metric has value equal to its initValue, then it can/should be filtered out before aggregating with SQLMetrics.stringValue().

and here the comment about the idea of having -1 indicates that it's originally introduced to "distinguish whether a SQLMetric is initialized or not" : #26899 (comment)

and the comment in the code here:

Metrics with value equal to initValue should be filtered out, since they are either invalid or safe to filter without changing...

However, in MetricUtils.stringValue, it keeps 0 as a valid value in the filter logic, even it's possible that the initValue for these metrics can be 0. This seems to contradict the semantics of initValue and makes the invalid/uninitialized values pollute the min, med... calculation - for example, the med may point to an unintialized/initValue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants