You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Ashutosh Chauhan (JIRA)" <ji...@apache.org> on 2016/11/17 20:03:58 UTC

[jira] [Commented] (HIVE-15234) Semijoin cardinality estimation can be improved

    [ https://issues.apache.org/jira/browse/HIVE-15234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15674667#comment-15674667 ] 

Ashutosh Chauhan commented on HIVE-15234:
-----------------------------------------

There are two ways to address this:
* Get rid of {{leftSemiJoin}} field in Join and make stats computation logic to work on HiveSemiJoin.
* Get rid of HiveSemiJoin and make all the rules work on {{leftSemiJoin}} field of Join.

I initially went with approach 2) but quickly found out (all?) current calcite rules work correctly with SemiJoin, but dont (and cannot) take into account field hidden in HiveJoin. RelFieldTrimmer, filterJointranspose were I found, but I assume it would be true for many other rules, since otherwise we would get exception on current master. Thus I think option 1) is better here. Also, because function signatures will force a dev to handle HiveSemiJoin, but a field hidden in Join rel node won't. Thus having explicit SemiJoin is more robust.

> Semijoin cardinality estimation can be improved
> -----------------------------------------------
>
>                 Key: HIVE-15234
>                 URL: https://issues.apache.org/jira/browse/HIVE-15234
>             Project: Hive
>          Issue Type: Bug
>          Components: CBO, Logical Optimizer
>    Affects Versions: 2.0.0, 2.1.0
>            Reporter: Ashutosh Chauhan
>            Assignee: Ashutosh Chauhan
>
> Currently calcite optimization rules rely on (Hive)SemiJoin to represent semi join node, whereas Stats estimate use {{leftSemiJoin}} field of Join to estimate stats. As a result semi-join specific stats calculation logic is never hit since at plan generation time HiveSemiJoin is created and leftSemiJoin field of Join is never set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)