You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Takeshi Yamamuro (Jira)" <ji...@apache.org> on 2020/03/06 08:00:00 UTC

[jira] [Resolved] (SPARK-30279) Support 32 or more grouping attributes for GROUPING_ID

     [ https://issues.apache.org/jira/browse/SPARK-30279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Takeshi Yamamuro resolved SPARK-30279.
--------------------------------------
    Fix Version/s: 3.1.0
         Assignee: Takeshi Yamamuro
       Resolution: Fixed

Resolved by [https://github.com/apache/spark/pull/26918]

> Support 32 or more grouping attributes for GROUPING_ID 
> -------------------------------------------------------
>
>                 Key: SPARK-30279
>                 URL: https://issues.apache.org/jira/browse/SPARK-30279
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.0.0, 2.4.6
>            Reporter: Takeshi Yamamuro
>            Assignee: Takeshi Yamamuro
>            Priority: Major
>             Fix For: 3.1.0
>
>
> This ticket targets to support 32 or more grouping attributes for GROUPING_ID. In the current master, an integer overflow can occur to compute grouping IDs;
> https://github.com/apache/spark/blob/e75d9afb2f282ce79c9fd8bce031287739326a4f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala#L613
> For example, the query below generates wrong grouping IDs in the master;
> {code}
> scala> val numCols = 32 // or, 31
> scala> val cols = (0 until numCols).map { i => s"c$i" }
> scala> sql(s"create table test_$numCols (${cols.map(c => s"$c int").mkString(",")}, v int) using parquet")
> scala> val insertVals = (0 until numCols).map { _ => 1 }.mkString(",")
> scala> sql(s"insert into test_$numCols values ($insertVals,3)")
> scala> sql(s"select grouping_id(), sum(v) from test_$numCols group by grouping sets ((${cols.mkString(",")}), (${cols.init.mkString(",")}))").show(10, false)
> scala> sql(s"drop table test_$numCols")
> // numCols = 32
> +-------------+------+
> |grouping_id()|sum(v)|
> +-------------+------+
> |0            |3     |
> |0            |3     | // Wrong Grouping ID
> +-------------+------+
> // numCols = 31
> +-------------+------+
> |grouping_id()|sum(v)|
> +-------------+------+
> |0            |3     |
> |1            |3     |
> +-------------+------+
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org