You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Vladimir Sitnikov (Jira)" <ji...@apache.org> on 2020/01/07 19:15:01 UTC

[jira] [Comment Edited] (CALCITE-1824) GROUP_ID returns wrong result

    [ https://issues.apache.org/jira/browse/CALCITE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17010007#comment-17010007 ] 

Vladimir Sitnikov edited comment on CALCITE-1824 at 1/7/20 7:14 PM:
--------------------------------------------------------------------

{quote}The problem with UNION ALL is that it duplicates the input relational expression. So we would tend to implement a plan that evaluates it twice.{quote}
The problem with CROSS JOIN inside GROUP BY is it would likely defeat any optimizer that knows how to optimize regular GROUP BY.

For instance if (...cross join...) group by will be sent via JDBC, the underlying DB won't be able to optimize it properly.


was (Author: vladimirsitnikov):
{quote}The problem with UNION ALL is that it duplicates the input relational expression. So we would tend to implement a plan that evaluates it twice.{quote}
The problem with CROSS JOIN inside GROUP BY is it would likely defeat any optimizer that knows how to optimize regular GROUP BY.

> GROUP_ID returns wrong result
> -----------------------------
>
>                 Key: CALCITE-1824
>                 URL: https://issues.apache.org/jira/browse/CALCITE-1824
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Julian Hyde
>            Priority: Major
>
> We implemented the {{GROUP_ID()}} function in CALCITE-512 but we got the specification wrong, and it returns the wrong result.
> {{GROUP_ID}} is not in the SQL standard. It is implemented only by Oracle.
> I mistakenly believed that {{GROUP_ID()}} is equivalent to {{GROUPING_ID(g1, ..., gn)}} (in a query with {{GROUP BY g1, ..., gn}}). In fact, {{GROUP_ID}} is useful only if you have duplicate grouping sets. If grouping sets are distinct, {{GROUP_ID()}} will always return zero.
> Example 1
> {code}SELECT deptno, job, GROUP_ID() AS g
> FROM Emp
> GROUP BY ROLLUP(deptno, job)
>     DEPTNO JOB		      G
> ---------- --------- ----------
> 	10 CLERK	      0
> 	10 MANAGER	      0
> 	10 PRESIDENT	      0
> 	10		      0
> 	20 CLERK	      0
> 	20 ANALYST	      0
> 	20 MANAGER	      0
> 	20		      0
> 	30 CLERK	      0
> 	30 MANAGER	      0
> 	30 SALESMAN	      0
> 	30		      0
> 			      0
> {code} produces grouping sets (deptno, job), (deptno), (). These are distinct, so GROUP_ID() is 0 for all rows.
> Example 2
> {code}SELECT deptno, GROUP_ID() AS g
> FROM Emp
> GROUP BY GROUPING SETS (deptno, (), ());
>     DEPTNO	    G
> ---------- ----------
> 	10	    0
> 	20	    0
> 	30	    0
> 		    0
> 		    1
> {code}
> As you can see, the grouping set () occurs twice. So there is one row in the result for each occurrence: the first occurrence has g = 0; the second has g = 1.
> In my fix for CALCITE-1069, I will change GROUP_ID() to always return 0. This is wrong, but nevertheless closer to the required behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)