You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/02/11 07:13:47 UTC

[GitHub] [spark] ulysses-you opened a new pull request #35490: [SPARK-38185][SQL] Fix data incorrect if aggregate function is empty

ulysses-you opened a new pull request #35490:
URL: https://github.com/apache/spark/pull/35490

### What changes were proposed in this pull request?

Add `aggregateExpressions.nonEmpty` check in `groupOnly` function.

### Why are the changes needed?

The group only condition should check if the aggregate expression is empty.

In DataFrame api, it is allowed to make a empty aggregations.

So the following query should return 1 rather than 0 because it's a global aggregate.
```scala
val emptyAgg = Map.empty[String, String]
spark.range(2).where("id > 2").agg(emptyAgg).limit(1).count
```

### Does this PR introduce _any_ user-facing change?

yes, bug fix

### How was this patch tested?

Add test

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org