You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Ionut Boicu (Jira)" <ji...@apache.org> on 2021/08/12 07:09:00 UTC
[jira] [Created] (SPARK-36489) Aggregate functions over no grouping
keys, on tables with a single bucket, return multiple rows
Ionut Boicu created SPARK-36489:
-----------------------------------
Summary: Aggregate functions over no grouping keys, on tables with a single bucket, return multiple rows
Key: SPARK-36489
URL: https://issues.apache.org/jira/browse/SPARK-36489
Project: Spark
Issue Type: Bug
Components: Optimizer
Affects Versions: 3.1.2, 3.1.1, 3.1.0, 3.2.0, 3.1.3
Reporter: Ionut Boicu
When running any aggregate function, without any grouping keys, on a table with a single bucket, multiple rows are returned.
This happens because the aggregate function satisfies the `AllTuples` distribution, no `Exchange` will be planned, and the bucketed scan will be disabled.
Reproduction:
{code:java}
sql(
"""
|CREATE TABLE t1 (`id` BIGINT, `event_date` DATE)
|USING PARQUET
|CLUSTERED BY (id)
|INTO 1 BUCKETS
|""".stripMargin)
sql(
"""
|INSERT INTO TABLE t1 VALUES(1.23, cast("2021-07-07" as date))
|""".stripMargin)
sql(
"""
|INSERT INTO TABLE t1 VALUES(2.28, cast("2021-08-08" as date))
|""".stripMargin)
assert(sql("select sum(id) from t1 where id is not null").count == 1){code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org