You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Michael Mior (Jira)" <ji...@apache.org> on 2020/02/01 20:21:00 UTC

[jira] [Commented] (CALCITE-1824) GROUP_ID returns wrong result

    [ https://issues.apache.org/jira/browse/CALCITE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17028198#comment-17028198 ] 

Michael Mior commented on CALCITE-1824:
---------------------------------------

Fixed in [1eb22d0|https://github.com/apache/calcite/commit/1eb22d03c11b9362b50208261b99eecc3de28fe8].

 

> GROUP_ID returns wrong result
> -----------------------------
>
>                 Key: CALCITE-1824
>                 URL: https://issues.apache.org/jira/browse/CALCITE-1824
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>            Reporter: Julian Hyde
>            Assignee: Feng Zhu
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> We implemented the {{GROUP_ID()}} function in CALCITE-512 but we got the specification wrong, and it returns the wrong result.
> {{GROUP_ID}} is not in the SQL standard. It is implemented only by Oracle.
> I mistakenly believed that {{GROUP_ID()}} is equivalent to {{GROUPING_ID(g1, ..., gn)}} (in a query with {{GROUP BY g1, ..., gn}}). In fact, {{GROUP_ID}} is useful only if you have duplicate grouping sets. If grouping sets are distinct, {{GROUP_ID()}} will always return zero.
> Example 1
> {code}SELECT deptno, job, GROUP_ID() AS g
> FROM Emp
> GROUP BY ROLLUP(deptno, job)
>     DEPTNO JOB		      G
> ---------- --------- ----------
> 	10 CLERK	      0
> 	10 MANAGER	      0
> 	10 PRESIDENT	      0
> 	10		      0
> 	20 CLERK	      0
> 	20 ANALYST	      0
> 	20 MANAGER	      0
> 	20		      0
> 	30 CLERK	      0
> 	30 MANAGER	      0
> 	30 SALESMAN	      0
> 	30		      0
> 			      0
> {code} produces grouping sets (deptno, job), (deptno), (). These are distinct, so GROUP_ID() is 0 for all rows.
> Example 2
> {code}SELECT deptno, GROUP_ID() AS g
> FROM Emp
> GROUP BY GROUPING SETS (deptno, (), ());
>     DEPTNO	    G
> ---------- ----------
> 	10	    0
> 	20	    0
> 	30	    0
> 		    0
> 		    1
> {code}
> As you can see, the grouping set () occurs twice. So there is one row in the result for each occurrence: the first occurrence has g = 0; the second has g = 1.
> In my fix for CALCITE-1069, I will change GROUP_ID() to always return 0. This is wrong, but nevertheless closer to the required behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)