You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Takeshi Yamamuro (Jira)" <ji...@apache.org> on 2019/12/17 13:36:00 UTC
[jira] [Commented] (SPARK-29708) Different answers in aggregates of multiple grouping sets

    [ https://issues.apache.org/jira/browse/SPARK-29708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16998229#comment-16998229 ] 

Takeshi Yamamuro commented on SPARK-29708:
------------------------------------------

I'm looking into this issue, SPARK-29699, and SPARK-29701.

> Different answers in aggregates of multiple grouping sets
> ---------------------------------------------------------
>
>                 Key: SPARK-29708
>                 URL: https://issues.apache.org/jira/browse/SPARK-29708
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Takeshi Yamamuro
>            Priority: Major
>              Labels: correctness
>
> A query below with multiple grouping sets seems to have different answers between PgSQL and Spark;
> {code:java}
> postgres=# create table gstest4(id integer, v integer, unhashable_col bit(4), unsortable_col xid);
> postgres=# insert into gstest4
> postgres-# values (1,1,b'0000','1'), (2,2,b'0001','1'),
> postgres-#        (3,4,b'0010','2'), (4,8,b'0011','2'),
> postgres-#        (5,16,b'0000','2'), (6,32,b'0001','2'),
> postgres-#        (7,64,b'0010','1'), (8,128,b'0011','1');
> INSERT 0 8
> postgres=# select unsortable_col, count(*)
> postgres-#   from gstest4 group by grouping sets ((unsortable_col),(unsortable_col))
> postgres-#   order by text(unsortable_col);
>  unsortable_col | count 
> ----------------+-------
>               1 |     8
>               1 |     8
>               2 |     8
>               2 |     8
> (4 rows)
> {code}
> {code:java}
> scala> sql("""create table gstest4(id integer, v integer, unhashable_col /* bit(4) */ byte, unsortable_col /* xid */ integer) using parquet""")
> scala> sql("""
>      | insert into gstest4
>      | values (1,1,tinyint('0'),1), (2,2,tinyint('1'),1),
>      |        (3,4,tinyint('2'),2), (4,8,tinyint('3'),2),
>      |        (5,16,tinyint('0'),2), (6,32,tinyint('1'),2),
>      |        (7,64,tinyint('2'),1), (8,128,tinyint('3'),1)
>      | """)
> res21: org.apache.spark.sql.DataFrame = []
> scala> 
> scala> sql("""
>      | select unsortable_col, count(*)
>      |   from gstest4 group by grouping sets ((unsortable_col),(unsortable_col))
>      |   order by string(unsortable_col)
>      | """).show
> +--------------+--------+
> |unsortable_col|count(1)|
> +--------------+--------+
> |             1|       8|
> |             2|       8|
> +--------------+--------+
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org