[SPARK-51299][SQL][UI] MetricUtils.stringValue should filter metric values with initValue rather than a hardcoded value #50055

jiwen624 · 2025-02-24T03:00:14Z

What changes were proposed in this pull request?

This PR proposes to use initValue of a metric in org.apache.spark.util.MetricUtils.stringValue instead of a hardcoded value 0 to filter out invalid metric values. A new field initValue is added to SQLMetricInfo and SQLPlanMetric so it can be passed into org.apache.spark.util.MetricUtils.stringValue to do the filtering accordingly.

Why are the changes needed?

In method org.apache.spark.util.MetricUtils.stringValue, it uses a hardcoded value 0 to filter out invalid metric values for SIZE_METRIC, TIMING_METRIC and NS_TIMING_METRIC:

val validValues = values.filter(_ >= 0)
However, in SQLMetrics it offers methods to create these types of metrics with initValue other than -1 (introduced in this PR #41555) :

def createSizeMetric(sc: SparkContext, name: String, initValue: Long = -1): SQLMetric = {
which means there is a chance that the metrics are created with a initValue != -1 and in this case the filter above will generate incorrect results.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing UTs

Was this patch authored or co-authored using generative AI tooling?

No.

jiwen624 · 2025-02-24T05:02:47Z

@dongjoon-hyun @cloud-fan could you let me know your thoughts about this fix? Thanks.

cloud-fan · 2025-02-24T14:06:19Z

I'm not very convinced by the "Why" section. What's the end-to-end problem you hit?

BTW #47721 has some more context about the SQLMetric initial value.

jiwen624 · 2025-02-26T05:12:17Z

@cloud-fan Thanks a lot for the response and the context in #47721. It's very helpful.
From what I see in previous PRs and comments in the code, it seems that we treat initValue in SQLMetric as a invalid value which should be filtered out. For example, in this recent PR #44222, it mentions:

initValue is the starting value for a SQLMetric. If a metric has value equal to its initValue, then it can/should be filtered out before aggregating with SQLMetrics.stringValue().

and here the comment about the idea of having -1 indicates that it's originally introduced to "distinguish whether a SQLMetric is initialized or not" : #26899 (comment)

and the comment in the code here:

Metrics with value equal to initValue should be filtered out, since they are either invalid or safe to filter without changing...

However, in MetricUtils.stringValue, it keeps 0 as a valid value in the filter logic, even it's possible that the initValue for these metrics can be 0. This seems to contradict the semantics of initValue and makes the invalid/uninitialized values pollute the min, med... calculation - for example, the med may point to an unintialized/initValue.

filter out invalid metric values with initValue

4770c18

github-actions bot added SQL WEB UI CORE PYTHON labels Feb 24, 2025

jiwen624 changed the title ~~[WIP][SPARK-51299][SQL][Web UI] Filter out invalid metric values with initValue~~ [WIP][SPARK-51299][SQL][UI] Filter out invalid metric values with initValue Feb 24, 2025

jiwen624 changed the title ~~[WIP][SPARK-51299][SQL][UI] Filter out invalid metric values with initValue~~ [WIP][SPARK-51299][SQL][UI] MetricUtils.stringValue should filter metric values with initValue rather than a hardcoded value Feb 24, 2025

jiwen624 changed the title ~~[WIP][SPARK-51299][SQL][UI] MetricUtils.stringValue should filter metric values with initValue rather than a hardcoded value~~ [SPARK-51299][SQL][UI] MetricUtils.stringValue should filter metric values with initValue rather than a hardcoded value Feb 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-51299][SQL][UI] MetricUtils.stringValue should filter metric values with initValue rather than a hardcoded value #50055

[SPARK-51299][SQL][UI] MetricUtils.stringValue should filter metric values with initValue rather than a hardcoded value #50055

jiwen624 commented Feb 24, 2025 •

edited

Loading

jiwen624 commented Feb 24, 2025

cloud-fan commented Feb 24, 2025

jiwen624 commented Feb 26, 2025

[SPARK-51299][SQL][UI] MetricUtils.stringValue should filter metric values with initValue rather than a hardcoded value #50055

Are you sure you want to change the base?

[SPARK-51299][SQL][UI] MetricUtils.stringValue should filter metric values with initValue rather than a hardcoded value #50055

Conversation

jiwen624 commented Feb 24, 2025 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

jiwen624 commented Feb 24, 2025

cloud-fan commented Feb 24, 2025

jiwen624 commented Feb 26, 2025

jiwen624 commented Feb 24, 2025 •

edited

Loading