You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Stamatis Zampetakis (Jira)" <ji...@apache.org> on 2021/04/09 21:52:00 UTC

[jira] [Created] (HIVE-24999) HiveSubQueryRemoveRule generates invalid plan for IN subquery with multiple correlations

Stamatis Zampetakis created HIVE-24999:
------------------------------------------

             Summary: HiveSubQueryRemoveRule generates invalid plan for IN subquery with multiple correlations
                 Key: HIVE-24999
                 URL: https://issues.apache.org/jira/browse/HIVE-24999
             Project: Hive
          Issue Type: Bug
          Components: CBO
            Reporter: Stamatis Zampetakis
            Assignee: Stamatis Zampetakis


The problem can be reproduced by using the following query which at the moment can be found in {{subquery_in.q}} file:

{code:sql}
explain cbo select * from part where p_name IN (select p_name from part p where p.p_size = part.p_size AND part.p_size + 121150 = p.p_partkey );
{code}

The plans before and after {{HiveSubQueryRemoveRule}} are shown below:

{noformat}
2021-04-09T14:29:08,031 DEBUG [9f8b0342-5609-4917-95a9-e7abc884f619 main] parse.CalcitePlanner: Plan before removing subquery:
HiveProject(p_partkey=[$0], p_name=[$1], p_mfgr=[$2], p_brand=[$3], p_type=[$4], p_size=[$5], p_container=[$6], p_retailprice=[$7], p_comment=[$8])
  HiveFilter(condition=[IN($1, {
HiveProject(p_name=[$1])
  HiveFilter(condition=[AND(=($5, $cor0.p_size), =(+($cor0.p_size, 121150), $0))])
    HiveTableScan(table=[[default, part]], table:alias=[p])
})])
    HiveTableScan(table=[[default, part]], table:alias=[part])

2021-04-09T14:29:08,056 DEBUG [9f8b0342-5609-4917-95a9-e7abc884f619 main] parse.CalcitePlanner: Plan just after removing subquery:
HiveProject(p_partkey=[$0], p_name=[$1], p_mfgr=[$2], p_brand=[$3], p_type=[$4], p_size=[$5], p_container=[$6], p_retailprice=[$7], p_comment=[$8])
  HiveFilter(condition=[=($1, $12)])
    LogicalCorrelate(correlation=[$cor0], joinType=[semi], requiredColumns=[{5}])
      HiveTableScan(table=[[default, part]], table:alias=[part])
      HiveProject(p_name=[$1])
        HiveFilter(condition=[AND(=($5, $cor0.p_size), =(+($cor0.p_size, 121150), $0))])
          HiveTableScan(table=[[default, part]], table:alias=[p])
{noformat}

The plan after applying the rule is invalid. The {{HiveFilter(condition=[=($1, $12)])}} above the correlate references columns ($12) from the right input which do not exist since the correlate is of type SEMI. Running the test with {{-Dcalcite.debug}} property enabled raises an {{AssertionError}} when building the {{HiveFilter}}.

The problem is hidden at the moment since there is a specific hack in {{HiveRelDecorrelator}} that turns this invalid plan into a valid one. This mechanism is very brittle and it can break easily as it happened while fixing HIVE-24957.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)