You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Laljo John Pullokkaran (JIRA)" <ji...@apache.org> on 2015/11/30 23:41:11 UTC

[jira] [Commented] (HIVE-12477) CBO: Left Semijoins are incompatible with a cross-product

    [ https://issues.apache.org/jira/browse/HIVE-12477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15032630#comment-15032630 ] 

Laljo John Pullokkaran commented on HIVE-12477:
-----------------------------------------------

It seems like this is induced by constprop.
CBO transitive inference adds filter to both sides which then causes Hive const prop to remove on clause (as it becomes 1=1).

If above is true, then for the time being we could put a check in Hive's ConstantPropagateProcFactory.ConstantPropagateJoinProc to prevent it from removing semi-join on clause.

Correct fix may be to fix the operator; but that might take time.

> CBO: Left Semijoins are incompatible with a cross-product
> ---------------------------------------------------------
>
>                 Key: HIVE-12477
>                 URL: https://issues.apache.org/jira/browse/HIVE-12477
>             Project: Hive
>          Issue Type: Bug
>          Components: CBO
>    Affects Versions: 2.0.0
>            Reporter: Gopal V
>            Assignee: Jesus Camacho Rodriguez
>
> with HIVE-12017 in place, a few queries generate left sem-joins without a key.
> This is an invalid plan and can be produced by doing.
> {code}
> explain logical select count(1) from store_sales where ss_sold_date_sk in (select d_date_sk from date_dim where d_date_sk = 1);
> LOGICAL PLAN:  
> $hdt$_0:$hdt$_0:$hdt$_0:store_sales
>   TableScan (TS_0)
>     alias: store_sales
>     filterExpr: (ss_sold_date_sk = 1) (type: boolean)
>     Filter Operator (FIL_20)
>       predicate: (ss_sold_date_sk = 1) (type: boolean)
>       Select Operator (SEL_2)
>         Reduce Output Operator (RS_9)
>           sort order: 
>           Join Operator (JOIN_11)
>             condition map:
>                  Left Semi Join 0 to 1
>             keys:
>               0 
>               1 
>             Group By Operator (GBY_14)
>               aggregations: count(1)
>               mode: hash
> {code}
> without CBO
> {code}
> sq_1:date_dim
>   TableScan (TS_1)
>     alias: date_dim
>     filterExpr: ((1) IN (RS[6]) and (d_date_sk = 1)) (type: boolean)
>     Filter Operator (FIL_21)
>       predicate: ((1) IN (RS[6]) and (d_date_sk = 1)) (type: boolean)
>       Select Operator (SEL_3)
>         expressions: 1 (type: int)
>         outputColumnNames: _col0
>         Group By Operator (GBY_5)
>           keys: _col0 (type: int)
>           mode: hash
>           outputColumnNames: _col0
>           Reduce Output Operator (RS_8)
>             key expressions: _col0 (type: int)
>             sort order: +
>             Map-reduce partition columns: _col0 (type: int)
>             Join Operator (JOIN_9)
>               condition map:
>                    Left Semi Join 0 to 1
>               keys:
>                 0 ss_sold_date_sk (type: int)
>                 1 _col0 (type: int)
>               Group By Operator (GBY_12)
>                 aggregations: count(1)
>                 mode: hash
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)