You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/03/21 08:13:00 UTC
[jira] [Work logged] (HIVE-26043) Use constraint info when creating RexNodes

     [ https://issues.apache.org/jira/browse/HIVE-26043?focusedWorklogId=744870&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-744870 ]

ASF GitHub Bot logged work on HIVE-26043:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 21/Mar/22 08:12
            Start Date: 21/Mar/22 08:12
    Worklog Time Spent: 10m 
      Work Description: kasakrisz opened a new pull request #3120:
URL: https://github.com/apache/hive/pull/3120


   ### What changes were proposed in this pull request?
   Use the data type coming from the Calcite `RelDataType` when creating `RexNode`s from `ColumnInfo` if possible. This process is part of the compilation: converting the query represented by QB objects and AST subtrees to Calcite logical plan.
   
   ### Why are the changes needed?
   When TableScan logical operators are created table constraint information is pulled from the metastore and applied the `RelDataType` associated to the TS operator. 
   Constraints like `not null` has an affect the way Hive rewrites subqueries to joins. Example: if the optimizer is aware that the lhs of an in subquery can not be null a simpler rewrite is chosen. See jira for example plans.
   
   ### Does this PR introduce _any_ user-facing change?
   No, but `explain` command output may be different in older Hive versions.
   
   ### How was this patch tested?
   ```
   mvn test -Dtest.output.overwrite -DskipSparkTests -Dtest=TestTezTPCDS30TBPerfCliDriver -Dqfile=query45.q -pl itests/qtest -Pitests
   mvn test -Dtest.output.overwrite -DskipSparkTests -Dtest=TestMiniLlapLocalCliDriver -Dqfile=subquery_in.q -pl itests/qtest -Pitests
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 744870)
    Remaining Estimate: 0h
            Time Spent: 10m

> Use constraint info when creating RexNodes
> ------------------------------------------
>
>                 Key: HIVE-26043
>                 URL: https://issues.apache.org/jira/browse/HIVE-26043
>             Project: Hive
>          Issue Type: Improvement
>          Components: CBO
>            Reporter: Krisztian Kasa
>            Assignee: Krisztian Kasa
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Prior HIVE-23100 Not null constraints affected newly created RexNode type nullability.
> Nullability enables the subquery rewrite algorithm to generate more optimal plan.
> [https://github.com/apache/hive/blob/1213ad3f0ae0e21e7519dc28b8b6d1401cdd1441/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveSubQueryRemoveRule.java#L324]
> Example:
> {code:java}
> explain cbo
> select ws_sales_price
>  from web_sales, customer, item
>  where ws_bill_customer_sk = c_customer_sk
>  	and ws_item_sk = i_item_sk
>  	and ( c_customer_sk = 1
>  	      or
>  	      i_item_id in (select i_item_id
>                              from item
>                              where i_item_sk in (2, 3)
>                              )
>  	    );
> {code}
> Without not null constraints
> {code:java}
> HiveProject(ws_sales_price=[$2])
>   HiveFilter(condition=[OR(AND(<>($6, 0), IS NOT NULL($8)), =($3, 1))])
>     HiveProject(ws_item_sk=[$0], ws_bill_customer_sk=[$1], ws_sales_price=[$2], c_customer_sk=[$8], i_item_sk=[$3], i_item_id=[$4], c=[$5], i_item_id0=[$6], literalTrue=[$7])
>       HiveJoin(condition=[=($1, $8)], joinType=[inner], algorithm=[none], cost=[not available])
>         HiveJoin(condition=[=($0, $3)], joinType=[inner], algorithm=[none], cost=[not available])
>           HiveProject(ws_item_sk=[$2], ws_bill_customer_sk=[$3], ws_sales_price=[$20])
>             HiveFilter(condition=[IS NOT NULL($3)])
>               HiveTableScan(table=[[default, web_sales]], table:alias=[web_sales])
>           HiveJoin(condition=[=($1, $3)], joinType=[left], algorithm=[none], cost=[not available])
>             HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available])
>               HiveProject(i_item_sk=[$0], i_item_id=[$1])
>                 HiveTableScan(table=[[default, item]], table:alias=[item])
>               HiveProject(c=[$0])
>                 HiveAggregate(group=[{}], c=[COUNT()])
>                   HiveFilter(condition=[IN($0, 2:BIGINT, 3:BIGINT)])
>                     HiveTableScan(table=[[default, item]], table:alias=[item])
>             HiveProject(i_item_id=[$0], literalTrue=[true])
>               HiveAggregate(group=[{1}])
>                 HiveFilter(condition=[IN($0, 2:BIGINT, 3:BIGINT)])
>                   HiveTableScan(table=[[default, item]], table:alias=[item])
>         HiveProject(c_customer_sk=[$0])
>           HiveTableScan(table=[[default, customer]], table:alias=[customer])
> {code}
> With not null constraints
> {code:java}
> HiveProject(ws_sales_price=[$2])
>   HiveFilter(condition=[OR(IS NOT NULL($7), =($3, 1))])
>     HiveProject(ws_item_sk=[$0], ws_bill_customer_sk=[$1], ws_sales_price=[$2], c_customer_sk=[$7], i_item_sk=[$3], i_item_id=[$4], i_item_id0=[$5], literalTrue=[$6])
>       HiveJoin(condition=[=($1, $7)], joinType=[inner], algorithm=[none], cost=[not available])
>         HiveJoin(condition=[=($0, $3)], joinType=[inner], algorithm=[none], cost=[not available])
>           HiveProject(ws_item_sk=[$2], ws_bill_customer_sk=[$3], ws_sales_price=[$20])
>             HiveFilter(condition=[IS NOT NULL($3)])
>               HiveTableScan(table=[[default, web_sales]], table:alias=[web_sales])
>           HiveJoin(condition=[=($1, $2)], joinType=[left], algorithm=[none], cost=[not available])
>             HiveProject(i_item_sk=[$0], i_item_id=[$1])
>               HiveTableScan(table=[[default, item]], table:alias=[item])
>             HiveProject(i_item_id=[$0], literalTrue=[true])
>               HiveAggregate(group=[{1}])
>                 HiveFilter(condition=[IN($0, 2:BIGINT, 3:BIGINT)])
>                   HiveTableScan(table=[[default, item]], table:alias=[item])
>         HiveProject(c_customer_sk=[$0])
>           HiveTableScan(table=[[default, customer]], table:alias=[customer])
> {code}
> In the first plan when not null constraints was ignored there is an extra {{item}} table join without join condition:
> {code:java}
>             HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available])
>               HiveProject(i_item_sk=[$0], i_item_id=[$1])
>                 HiveTableScan(table=[[default, item]], table:alias=[item])
>               HiveProject(c=[$0])
>                 HiveAggregate(group=[{}], c=[COUNT()])
>                   HiveFilter(condition=[IN($0, 2:BIGINT, 3:BIGINT)])
>                     HiveTableScan(table=[[default, item]], table:alias=[item])
> {code}
> The planner is not aware that the {{i_item_id}} column has {{not null}} defined and it expects null values which needs the extra join.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)