You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Vineet Garg (JIRA)" <ji...@apache.org> on 2017/02/02 02:09:51 UTC

[jira] [Commented] (HIVE-15458) Fix semi-join conversion rule for subquery

    [ https://issues.apache.org/jira/browse/HIVE-15458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15849292#comment-15849292 ] 

Vineet Garg commented on HIVE-15458:
------------------------------------

This is not really subquery issue. This is also reproducible with following query:
{code:SQL} select part.p_type from part join (select p1.p_name from part p1, part p2 group by p1.p_name) pp where pp.p_name = part.p_name; {code}

This will throw following exception in hive log
{noformat}
org.apache.hadoop.hive.ql.parse.SemanticException: Line 0:-1 Invalid table alias or column reference '$hdt$_0': (possible column names are: $hdt$_1.p_name, $hdt$_2.dummy)
{noformat}

Note that after throwing this exception HIVE falls back to non-cbo path to execute this query successfully, so beeline/hivecli won't see this error.

Issue is during conversion of calcite plan to AST, specifically following code in {{ASTConverter.java}}

{code}
else if (r instanceof Join) {
      Join join = (Join) r;
      QueryBlockInfo left = convertSource(join.getLeft());
      QueryBlockInfo right = convertSource(join.getRight());
      s = new Schema(left.schema, right.schema);
      ASTNode cond = join.getCondition().accept(new RexVisitor(s));
      boolean semiJoin = join instanceof SemiJoin;
      if (join.getRight() instanceof Join) {
        // Invert join inputs; this is done because otherwise the SemanticAnalyzer
        // methods to merge joins will not kick in
        JoinRelType type;
        if (join.getJoinType() == JoinRelType.LEFT) {
          type = JoinRelType.RIGHT;
        } else if (join.getJoinType() == JoinRelType.RIGHT) {
          type = JoinRelType.LEFT;
        } else {
          type = join.getJoinType();
        }
        ast = ASTBuilder.join(right.ast, left.ast, type, cond, semiJoin);
      } else {
        ast = ASTBuilder.join(left.ast, right.ast, join.getJoinType(), cond, semiJoin);
      }
      if (semiJoin) {
        s = left.schema;
      }
{code}

We should not be inverting join inputs for SEMI join since it change the semantics.

Bypassing this for semi-join produces correct AST but further throws an exception while generating joinTree  from AST in {{SemanticAnalyzer::genJoinTree()}}

Plan after semi-join optimization looks like as follow:

{code}
HiveProject(p_type=[$1])
  HiveSemiJoin(condition=[=($0, $2)], joinType=[inner])
    HiveProject(p_name=[$1], p_type=[$4])
      HiveFilter(condition=[IS NOT NULL($1)])
        HiveTableScan(table=[[default.part]], table:alias=[part])
    HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available])
      HiveProject(p_name=[$1])
        HiveFilter(condition=[IS NOT NULL($1)])
          HiveTableScan(table=[[default.part]], table:alias=[p1])
      HiveProject(DUMMY=[0])
        HiveTableScan(table=[[default.part]], table:alias=[p2])
{code}


Since {{HiveSemiJoin}} has {{HiveJoin}} as it's right input following code in {{SemanticAnalyzer::genJoinTree()}} throws an error

{code}
ASTNode left = (ASTNode) joinParseTree.getChild(0);
    ASTNode right = (ASTNode) joinParseTree.getChild(1);
    boolean isValidLeftToken = isValidJoinSide(left);
    boolean isJoinLeftToken = !isValidLeftToken && isJoinToken(left);
    boolean isValidRightToken = isValidJoinSide(right);
    boolean isJoinRightToken = !isValidRightToken && isJoinToken(right);
    // TODO: if we didn't care about the column order, we could switch join sides here
    //       for TOK_JOIN and TOK_FULLOUTERJOIN.
    if (!isValidLeftToken && !isJoinLeftToken) {
      throw new SemanticException("Invalid token on the left side of the join: "
          + left.getToken().getText() + "; please rewrite your query");
    } else if (!isValidRightToken) {
      String advice= "";
      if (isJoinRightToken && !isJoinLeftToken) {
        advice = "; for example, put the nested join on the left side, or nest joins differently";
      } else if (isJoinRightToken) {
        advice = "; for example, nest joins differently";
      }
      throw new SemanticException("Invalid token on the right side of the join: "
      + right.getToken().getText() + "; please rewrite your query" + advice);
    }
{code}

{{genJoinTree}} does not expect it's right input to be another join

{code}
private static boolean isValidJoinSide(ASTNode right) {
    return (right.getToken().getType() == HiveParser.TOK_TABREF)
        || (right.getToken().getType() == HiveParser.TOK_SUBQUERY)
        || (right.getToken().getType() == HiveParser.TOK_PTBLFUNCTION);
  }
{code}

> Fix semi-join conversion rule for subquery
> ------------------------------------------
>
>                 Key: HIVE-15458
>                 URL: https://issues.apache.org/jira/browse/HIVE-15458
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Logical Optimizer
>            Reporter: Vineet Garg
>            Assignee: Vineet Garg
>
> Subquery code in *CalcitePlanner* turns off *hive.enable.semijoin.conversion* since it doesn't work for subqueries.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)