You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2019/10/21 09:01:00 UTC

[jira] [Work logged] (HIVE-19653) Incorrect predicate pushdown for groupby with grouping sets

     [ https://issues.apache.org/jira/browse/HIVE-19653?focusedWorklogId=331291&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-331291 ]

ASF GitHub Bot logged work on HIVE-19653:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 21/Oct/19 09:00
            Start Date: 21/Oct/19 09:00
    Worklog Time Spent: 10m 
      Work Description: fan624009652 commented on issue #354: HIVE-19653: Incorrect predicate pushdown for groupby with grouping sets
URL: https://github.com/apache/hive/pull/354#issuecomment-544419803
 
 
   Now I'm facing this problem and I wonder why this pull request is still unmerged.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 331291)
    Remaining Estimate: 0h
            Time Spent: 10m

> Incorrect predicate pushdown for groupby with grouping sets
> -----------------------------------------------------------
>
>                 Key: HIVE-19653
>                 URL: https://issues.apache.org/jira/browse/HIVE-19653
>             Project: Hive
>          Issue Type: Bug
>          Components: Logical Optimizer
>            Reporter: Zhang Li
>            Assignee: Zhang Li
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>         Attachments: HIVE-19653.1.patch, HIVE-19653.patch
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Consider the following query:
> {code:java}
> CREATE TABLE T1(a STRING, b STRING, s BIGINT);
> INSERT OVERWRITE TABLE T1 VALUES ('aaaa', 'bbbb', 123456);
> SELECT * FROM (
> SELECT a, b, sum(s)
> FROM T1
> GROUP BY a, b GROUPING SETS ((), (a), (b), (a, b))
> ) t WHERE a IS NOT NULL;
> {code}
> When hive.optimize.ppd is enabled (and hive.cbo.enable=false), the query will output:
> {code:java}
> NULL	NULL	123456
> NULL	bbbb	123456
> aaaa	NULL	123456
> aaaa	bbbb	123456
> {code}
> We can see the predicate "a IS NOT NULL" takes no effect, which is incorrect.
> When performing PPD optimization for a GBY operator, we should make sure all grouping sets contains the processing expr before pushdown. otherwise the expr value after GBY is changed and the result is wrong.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)