You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by "davintjong-db (via GitHub)" <gi...@apache.org> on 2023/12/07 00:34:53 UTC

[PR] [WIP][SPARK-46294][SQL] Clean up semantics of init vs zero value [spark]

davintjong-db opened a new pull request, #44222:
URL: https://github.com/apache/spark/pull/44222

### What changes were proposed in this pull request?

Cleaning up the semantics of init and zero value to the following. This also helps define what an "invalid" metric is.

initValue is the starting value for a SQLMetric. If a metric has value equal to its initValue, then it should be filtered out before aggregating with SQLMetrics.stringValue().

zeroValue defines the lowest value considered valid. If a SQLMetric is invalid, it is set to zeroValue upon receiving any updates, and it also reports zeroValue as its value to avoid exposing it to the user programatically (concern previouosly addressed in [SPARK-41442](https://issues.apache.org/jira/browse/SPARK-41442)).

For many SQLMetrics, we use initValue = -1 and zeroValue = 0 to indicate that the metric is by default invalid. At the end of a task, we will update the metric making it valid, and the invalid metrics will be filtered out when calculating min, max, etc. as a workaround for [SPARK-11013](https://issues.apache.org/jira/browse/SPARK-11013).

### Why are the changes needed?

The semantics of initValue and _zeroValue in SQLMetrics is a little bit confusing, since they effectively mean the same thing. Changing it to the following would be clearer, especially in terms of defining what an "invalid" metric is.

### Does this PR introduce _any_ user-facing change?
No. This shouldn't change any behavior.

### How was this patch tested?
Existing tests.

### Was this patch authored or co-authored using generative AI tooling?
No.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org