You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Ionut Boicu (Jira)" <ji...@apache.org> on 2021/08/12 07:09:00 UTC

[jira] [Created] (SPARK-36489) Aggregate functions over no grouping keys, on tables with a single bucket, return multiple rows

Ionut Boicu created SPARK-36489:
-----------------------------------

             Summary: Aggregate functions over no grouping keys, on tables with a single bucket, return multiple rows
                 Key: SPARK-36489
                 URL: https://issues.apache.org/jira/browse/SPARK-36489
             Project: Spark
          Issue Type: Bug
          Components: Optimizer
    Affects Versions: 3.1.2, 3.1.1, 3.1.0, 3.2.0, 3.1.3
            Reporter: Ionut Boicu


When running any aggregate function, without any grouping keys, on a table with a single bucket, multiple rows are returned. 

This happens because the aggregate function satisfies the `AllTuples` distribution, no `Exchange` will be planned, and the bucketed scan will be disabled.

 

Reproduction:

 
{code:java}
sql(
   """
   |CREATE TABLE t1 (`id` BIGINT, `event_date` DATE)
   |USING PARQUET
   |CLUSTERED BY (id)
   |INTO 1 BUCKETS
   |""".stripMargin)

sql(
   """
   |INSERT INTO TABLE t1 VALUES(1.23, cast("2021-07-07" as date))
   |""".stripMargin)

sql(
   """
   |INSERT INTO TABLE t1 VALUES(2.28, cast("2021-08-08" as date))
   |""".stripMargin)

assert(sql("select sum(id) from t1 where id is not null").count == 1){code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org