You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@calcite.apache.org by "Julian Hyde (JIRA)" <ji...@apache.org> on 2017/06/08 17:42:18 UTC

[jira] [Commented] (CALCITE-1069) In Aggregate, deprecate indicators, and allow GROUPING to be used as an aggregate function

    [ https://issues.apache.org/jira/browse/CALCITE-1069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043110#comment-16043110 ] 

Julian Hyde commented on CALCITE-1069:
--------------------------------------

I have created a pull request. Please review. This could potentially break Hive, so I need a +1 from a developer involved with Hive.

I have endeavored to make this backwards compatible, by still allowing Aggregate with indicator = true. But it is not well tested. I strongly suggest that people convert to indicator = false. There are many benefits, for example, rules that were written for non-grouping sets queries should work with grouping sets unchanged or with minor modifications. (See CALCITE-461 for the pain that has caused.)

> In Aggregate, deprecate indicators, and allow GROUPING to be used as an aggregate function
> ------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-1069
>                 URL: https://issues.apache.org/jira/browse/CALCITE-1069
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Hari Sankar Sivarama Subramaniyan
>            Assignee: Julian Hyde
>
> Grouping sets are currently implemented in Calcite using a bit to indicate each
> of the grouping columns. For instance, consider the following group by clause:
> GROUP BY CUBE (a, b)
> The generated Aggregate operator in Calcite will have a row schema consisting of [a, b, GROUPING(a), GROUPING(b)], where GROUPING( x ) is a boolean field indicator which represents whether x is participating in the group by clause.
> In contrast, Hive's implementation stores a single number corresponding to the GROUPING bit vector associated with a row (this is the result of the GROUPING_ID function in RDBMS such as MSSQLServer, Oracle, etc). Thus, the row schema of the Aggregate operator is [a, b, GROUPING_ID(a,b)].
> This difference is creating a mismatch between Calcite and Hive. As of now, we work around this mismatch in the Hive side: we create our own GROUPING_ID function applied over those columns. However, we have some issues related to predicates pushdown, constant propagation, join project transpose rule (HIVE-12923)
> etc., that we need to continue solving as new rules are added to Hive optimizer. In short, this is making the code on the Hive side harder and harder to maintain. 
> This jira is intended to modify the implementation on the Calcite side to that we need not make workarounds/hacks in Hive to support Grouping IDs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)