You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Krisztian Kasa (Jira)" <ji...@apache.org> on 2022/03/17 13:20:00 UTC
[jira] [Created] (HIVE-26043) Use constraint info when creating RexNodes

Krisztian Kasa created HIVE-26043:
-------------------------------------

             Summary: Use constraint info when creating RexNodes
                 Key: HIVE-26043
                 URL: https://issues.apache.org/jira/browse/HIVE-26043
             Project: Hive
          Issue Type: Improvement
          Components: CBO
            Reporter: Krisztian Kasa
            Assignee: Krisztian Kasa


Prior HIVE-23100 Not null constraints affected newly created RexNode type nullability.
Nullability enables the subquery rewrite algorithm to generate more optimal plan.
[https://github.com/apache/hive/blob/1213ad3f0ae0e21e7519dc28b8b6d1401cdd1441/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveSubQueryRemoveRule.java#L324]

Example:
{code:java}
explain cbo
select ws_sales_price
 from web_sales, customer, item
 where ws_bill_customer_sk = c_customer_sk
 	and ws_item_sk = i_item_sk
 	and ( c_customer_sk = 1
 	      or
 	      i_item_id in (select i_item_id
                             from item
                             where i_item_sk in (2, 3)
                             )
 	    );
{code}
Without not null constraints
{code:java}
HiveProject(ws_sales_price=[$2])
  HiveFilter(condition=[OR(AND(<>($6, 0), IS NOT NULL($8)), =($3, 1))])
    HiveProject(ws_item_sk=[$0], ws_bill_customer_sk=[$1], ws_sales_price=[$2], c_customer_sk=[$8], i_item_sk=[$3], i_item_id=[$4], c=[$5], i_item_id0=[$6], literalTrue=[$7])
      HiveJoin(condition=[=($1, $8)], joinType=[inner], algorithm=[none], cost=[not available])
        HiveJoin(condition=[=($0, $3)], joinType=[inner], algorithm=[none], cost=[not available])
          HiveProject(ws_item_sk=[$2], ws_bill_customer_sk=[$3], ws_sales_price=[$20])
            HiveFilter(condition=[IS NOT NULL($3)])
              HiveTableScan(table=[[default, web_sales]], table:alias=[web_sales])
          HiveJoin(condition=[=($1, $3)], joinType=[left], algorithm=[none], cost=[not available])
            HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available])
              HiveProject(i_item_sk=[$0], i_item_id=[$1])
                HiveTableScan(table=[[default, item]], table:alias=[item])
              HiveProject(c=[$0])
                HiveAggregate(group=[{}], c=[COUNT()])
                  HiveFilter(condition=[IN($0, 2:BIGINT, 3:BIGINT)])
                    HiveTableScan(table=[[default, item]], table:alias=[item])
            HiveProject(i_item_id=[$0], literalTrue=[true])
              HiveAggregate(group=[{1}])
                HiveFilter(condition=[IN($0, 2:BIGINT, 3:BIGINT)])
                  HiveTableScan(table=[[default, item]], table:alias=[item])
        HiveProject(c_customer_sk=[$0])
          HiveTableScan(table=[[default, customer]], table:alias=[customer])
{code}
With not null constraints
{code:java}
HiveProject(ws_sales_price=[$2])
  HiveFilter(condition=[OR(IS NOT NULL($7), =($3, 1))])
    HiveProject(ws_item_sk=[$0], ws_bill_customer_sk=[$1], ws_sales_price=[$2], c_customer_sk=[$7], i_item_sk=[$3], i_item_id=[$4], i_item_id0=[$5], literalTrue=[$6])
      HiveJoin(condition=[=($1, $7)], joinType=[inner], algorithm=[none], cost=[not available])
        HiveJoin(condition=[=($0, $3)], joinType=[inner], algorithm=[none], cost=[not available])
          HiveProject(ws_item_sk=[$2], ws_bill_customer_sk=[$3], ws_sales_price=[$20])
            HiveFilter(condition=[IS NOT NULL($3)])
              HiveTableScan(table=[[default, web_sales]], table:alias=[web_sales])
          HiveJoin(condition=[=($1, $2)], joinType=[left], algorithm=[none], cost=[not available])
            HiveProject(i_item_sk=[$0], i_item_id=[$1])
              HiveTableScan(table=[[default, item]], table:alias=[item])
            HiveProject(i_item_id=[$0], literalTrue=[true])
              HiveAggregate(group=[{1}])
                HiveFilter(condition=[IN($0, 2:BIGINT, 3:BIGINT)])
                  HiveTableScan(table=[[default, item]], table:alias=[item])
        HiveProject(c_customer_sk=[$0])
          HiveTableScan(table=[[default, customer]], table:alias=[customer])
{code}
In the first plan when not null constraints was ignored there is an extra {{item}} table join without join condition:
{code:java}
            HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available])
              HiveProject(i_item_sk=[$0], i_item_id=[$1])
                HiveTableScan(table=[[default, item]], table:alias=[item])
              HiveProject(c=[$0])
                HiveAggregate(group=[{}], c=[COUNT()])
                  HiveFilter(condition=[IN($0, 2:BIGINT, 3:BIGINT)])
                    HiveTableScan(table=[[default, item]], table:alias=[item])

{code}
The planner is not aware that the {{i_item_id}} column has {{not null}} defined and it expects null values which needs the extra join.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)