You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "jiaan.geng (Jira)" <ji...@apache.org> on 2020/12/22 10:17:00 UTC
[jira] [Commented] (SPARK-29699) Different answers in nested
aggregates with window functions
[ https://issues.apache.org/jira/browse/SPARK-29699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253393#comment-17253393 ]
jiaan.geng commented on SPARK-29699:
------------------------------------
The behavior of various mainstream databases is different, it seems that there is no need to force which one.
> Different answers in nested aggregates with window functions
> ------------------------------------------------------------
>
> Key: SPARK-29699
> URL: https://issues.apache.org/jira/browse/SPARK-29699
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Affects Versions: 3.0.0
> Reporter: Takeshi Yamamuro
> Priority: Critical
>
> A nested aggregate below with a window function seems to have different answers in the `rsum` column between PgSQL and Spark;
> {code:java}
> postgres=# create table gstest2 (a integer, b integer, c integer, d integer, e integer, f integer, g integer, h integer);
> postgres=# insert into gstest2 values
> postgres-# (1, 1, 1, 1, 1, 1, 1, 1),
> postgres-# (1, 1, 1, 1, 1, 1, 1, 2),
> postgres-# (1, 1, 1, 1, 1, 1, 2, 2),
> postgres-# (1, 1, 1, 1, 1, 2, 2, 2),
> postgres-# (1, 1, 1, 1, 2, 2, 2, 2),
> postgres-# (1, 1, 1, 2, 2, 2, 2, 2),
> postgres-# (1, 1, 2, 2, 2, 2, 2, 2),
> postgres-# (1, 2, 2, 2, 2, 2, 2, 2),
> postgres-# (2, 2, 2, 2, 2, 2, 2, 2);
> INSERT 0 9
> postgres=#
> postgres=# select a, b, sum(c), sum(sum(c)) over (order by a,b) as rsum
> postgres-# from gstest2 group by rollup (a,b) order by rsum, a, b;
> a | b | sum | rsum
> ---+---+-----+------
> 1 | 1 | 8 | 8
> 1 | 2 | 2 | 10
> 1 | | 10 | 20
> 2 | 2 | 2 | 22
> 2 | | 2 | 24
> | | 12 | 36
> (6 rows)
> {code}
> {code:java}
> scala> sql("""
> | select a, b, sum(c), sum(sum(c)) over (order by a,b) as rsum
> | from gstest2 group by rollup (a,b) order by rsum, a, b
> | """).show()
> +----+----+------+----+
> | a| b|sum(c)|rsum|
> +----+----+------+----+
> |null|null| 12| 12|
> | 1|null| 10| 22|
> | 1| 1| 8| 30|
> | 1| 2| 2| 32|
> | 2|null| 2| 34|
> | 2| 2| 2| 36|
> +----+----+------+----+
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org