You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/02/06 07:15:02 UTC

[GitHub] [spark] amaliujia opened a new pull request #35404: [SPARK-38118][SQL] MEAN(Boolean) in the HAVING claus should throw data mismatch error

amaliujia opened a new pull request #35404:
URL: https://github.com/apache/spark/pull/35404

### What changes were proposed in this pull request?

```
with t as (select true c)
3select t.c
4from t
5group by t.c
6having mean(t.c) > 0 {code}
```

This query throws `Column 't.c' does not exist. Did you mean one of the following? [t.c]`

However, mean(boolean) is not a supported function signature, thus error result should be `cannot resolve 'mean(t.c)' due to data type mismatch: function average requires numeric or interval types, not boolean`

This is because

1. The mean(boolean) in HAVING was not marked as resolved in `ResolveFunctions` rule.
2. Thus in `ResolveAggregationFunctions`, the `TempResolvedColumn` as a wrapper in `mean(TempResolvedColumn(t.c))` cannot be removed (only resolved AGG can remove its’s `TempResolvedColumn`).
3. Thus in a later batch rule applying, `TempResolvedColumn` was reverted and it becomes mean(`t.c`), so mean loses the information about t.c.
4. Thus at the last step, the analyzer can only report t.c not found.

mean(boolean) in HAVING is not marked as resolved in {{ResolveFunctions}} rule because
1. It uses Expression default `resolved` field population code
{code:java}
lazy val resolved: Boolean = childrenResolved && checkInputDataTypes().isSuccess {code}
2. During the analyzing, mean(boolean) is mean(TempResolveColumn(boolean), thus childrenResolved is true.
3. however checkInputDataTypes() will be false ([Average.scala#L55|[https://github.com/apache/spark/blob/74ebef243c18e7a8f32bf90ea75ab6afed9e3132/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Average.scala#L55])]
4. Thus eventually Average's `resolved` will be false, but it leads to wrong error message.

### Why are the changes needed?

Improve error message so users can better debug their query.

### Does this PR introduce _any_ user-facing change?

Yes. This will change user-facing error message.

### How was this patch tested?

Unit Test

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org