You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Krisztian Kasa (Jira)" <ji...@apache.org> on 2020/04/17 09:50:00 UTC

[jira] [Commented] (HIVE-23206) Project not defined correctly after reordering a join

    [ https://issues.apache.org/jira/browse/HIVE-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17085607#comment-17085607 ] 

Krisztian Kasa commented on HIVE-23206:
---------------------------------------

I used the simplified query that [~scarlin] mentioned in the description to reproduce this.
Plan before *applyJoinOrderingTransform*:
{code}
HiveProject(ws_order_number=[$0])
  HiveJoin(condition=[=($0, $1)], joinType=[inner], algorithm=[none], cost=[not available])
    HiveProject(ws_order_number=[$0])
      HiveFilter(condition=[IS NOT NULL($0)])
        HiveTableScan(table=[[default, web_sales]], table:alias=[ws1])
    HiveAggregate(group=[{0}])
      HiveProject(wr_order_number=[$0])

        HiveJoin(condition=[=($0, $1)], joinType=[inner], algorithm=[none], cost=[not available])
          HiveProject(wr_order_number=[$0])
            HiveFilter(condition=[IS NOT NULL($0)])

              HiveTableScan(table=[[default, web_returns]], table:alias=[web_returns])
          HiveProject(ws_order_number=[$0])
            HiveJoin(condition=[=($0, $1)], joinType=[inner], algorithm=[none], cost=[not available])
              HiveProject(ws_order_number=[$0])
                HiveFilter(condition=[IS NOT NULL($0)])
                  HiveTableScan(table=[[default, web_sales]], table:alias=[ws1])
              HiveProject(wr_order_number=[$0])
                HiveFilter(condition=[IS NOT NULL($0)])
                  HiveTableScan(table=[[default, web_returns]], table:alias=[wr2])
{code}

Plan after *HiveJoinProjectTransposeRule.RIGHT_PROJECT_BTW_JOIN*:
{code}
HiveProject(ws_order_number=[$0])
  HiveJoin(condition=[=($0, $1)], joinType=[inner], algorithm=[none], cost=[not available])
    HiveProject(ws_order_number=[$0])
      HiveFilter(condition=[IS NOT NULL($0)])
        HiveTableScan(table=[[default, web_sales]], table:alias=[ws1])
    HiveAggregate(group=[{0}])
      HiveProject(wr_order_number=[$0])

        HiveProject(wr_order_number=[$0], ws_order_number=[$6])
          HiveJoin(condition=[=($0, $6)], joinType=[inner], algorithm=[none], cost=[not available])
            HiveFilter(condition=[IS NOT NULL($0)])

              HiveTableScan(table=[[default, web_returns]], table:alias=[web_returns])
            HiveProject(ws_order_number=[$0])
              HiveJoin(condition=[=($0, $1)], joinType=[inner], algorithm=[none], cost=[not available])
                HiveProject(ws_order_number=[$0])
                  HiveFilter(condition=[IS NOT NULL($0)])
                    HiveTableScan(table=[[default, web_sales]], table:alias=[ws1])
                HiveProject(wr_order_number=[$0])
                  HiveFilter(condition=[IS NOT NULL($0)])
                    HiveTableScan(table=[[default, web_returns]], table:alias=[wr2])
{code}

The project under the Join was pulled up on top of it and as a result there is no Project anymore between the Join and the Filter operator:
{code}
        HiveJoin(condition=[=($0, $1)], joinType=[inner], algorithm=[none], cost=[not available])
          HiveProject(wr_order_number=[$0])
            HiveFilter(condition=[IS NOT NULL($0)])
{code}
{code}
        HiveProject(wr_order_number=[$0], ws_order_number=[$6])
          HiveJoin(condition=[=($0, $6)], joinType=[inner], algorithm=[none], cost=[not available])
            HiveFilter(condition=[IS NOT NULL($0)])
{code}

Later in *PlanModifierForASTConv* a new Project is introduced between the Join and the Filter
https://github.com/apache/hive/blob/c4023c5a23c21933d68187b7c57b9b9c4f3cbdbb/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/PlanModifierForASTConv.java#L249

This method takes all the columns from the Filter operator that is why the Project contains all of them.


> Project not defined correctly after reordering a join
> -----------------------------------------------------
>
>                 Key: HIVE-23206
>                 URL: https://issues.apache.org/jira/browse/HIVE-23206
>             Project: Hive
>          Issue Type: Bug
>          Components: CBO
>            Reporter: Steve Carlin
>            Assignee: Krisztian Kasa
>            Priority: Major
>
> The following highlighted line seems to be incorrect in the test suite:
> [https://github.com/apache/hive/blob/master/ql/src/test/results/clientpositive/perf/tez/cbo_query95.q.out#L89]
> Note that the project takes all the columns from the table scan, yet it only needs a couple of them.
> I did some very small debugging on this.  When I removed the applyJoinOrderingTransform here: [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java#L1897]
> ... the problem goes away.  So presumably one of the rules in there is causing the problem.
> Here is a slightly simplified version of the query which has the same problem (using the same tpc-ds database):
> explain cbo with ws_wh as
> (select ws1.ws_order_number
> from web_sales ws1,web_returns wr2 
> where ws1.ws_order_number = wr2.wr_order_number)
> select 
>    ws_order_number
> from
>    web_sales ws1 
> where
> ws1.ws_order_number in (select wr_order_number
>                             from web_returns,ws_wh
>                             where wr_order_number = ws_wh.ws_order_number)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)