You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Vineet Garg (JIRA)" <ji...@apache.org> on 2017/02/02 02:09:51 UTC
[jira] [Commented] (HIVE-15458) Fix semi-join conversion rule for
subquery
[ https://issues.apache.org/jira/browse/HIVE-15458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15849292#comment-15849292 ]
Vineet Garg commented on HIVE-15458:
------------------------------------
This is not really subquery issue. This is also reproducible with following query:
{code:SQL} select part.p_type from part join (select p1.p_name from part p1, part p2 group by p1.p_name) pp where pp.p_name = part.p_name; {code}
This will throw following exception in hive log
{noformat}
org.apache.hadoop.hive.ql.parse.SemanticException: Line 0:-1 Invalid table alias or column reference '$hdt$_0': (possible column names are: $hdt$_1.p_name, $hdt$_2.dummy)
{noformat}
Note that after throwing this exception HIVE falls back to non-cbo path to execute this query successfully, so beeline/hivecli won't see this error.
Issue is during conversion of calcite plan to AST, specifically following code in {{ASTConverter.java}}
{code}
else if (r instanceof Join) {
Join join = (Join) r;
QueryBlockInfo left = convertSource(join.getLeft());
QueryBlockInfo right = convertSource(join.getRight());
s = new Schema(left.schema, right.schema);
ASTNode cond = join.getCondition().accept(new RexVisitor(s));
boolean semiJoin = join instanceof SemiJoin;
if (join.getRight() instanceof Join) {
// Invert join inputs; this is done because otherwise the SemanticAnalyzer
// methods to merge joins will not kick in
JoinRelType type;
if (join.getJoinType() == JoinRelType.LEFT) {
type = JoinRelType.RIGHT;
} else if (join.getJoinType() == JoinRelType.RIGHT) {
type = JoinRelType.LEFT;
} else {
type = join.getJoinType();
}
ast = ASTBuilder.join(right.ast, left.ast, type, cond, semiJoin);
} else {
ast = ASTBuilder.join(left.ast, right.ast, join.getJoinType(), cond, semiJoin);
}
if (semiJoin) {
s = left.schema;
}
{code}
We should not be inverting join inputs for SEMI join since it change the semantics.
Bypassing this for semi-join produces correct AST but further throws an exception while generating joinTree from AST in {{SemanticAnalyzer::genJoinTree()}}
Plan after semi-join optimization looks like as follow:
{code}
HiveProject(p_type=[$1])
HiveSemiJoin(condition=[=($0, $2)], joinType=[inner])
HiveProject(p_name=[$1], p_type=[$4])
HiveFilter(condition=[IS NOT NULL($1)])
HiveTableScan(table=[[default.part]], table:alias=[part])
HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available])
HiveProject(p_name=[$1])
HiveFilter(condition=[IS NOT NULL($1)])
HiveTableScan(table=[[default.part]], table:alias=[p1])
HiveProject(DUMMY=[0])
HiveTableScan(table=[[default.part]], table:alias=[p2])
{code}
Since {{HiveSemiJoin}} has {{HiveJoin}} as it's right input following code in {{SemanticAnalyzer::genJoinTree()}} throws an error
{code}
ASTNode left = (ASTNode) joinParseTree.getChild(0);
ASTNode right = (ASTNode) joinParseTree.getChild(1);
boolean isValidLeftToken = isValidJoinSide(left);
boolean isJoinLeftToken = !isValidLeftToken && isJoinToken(left);
boolean isValidRightToken = isValidJoinSide(right);
boolean isJoinRightToken = !isValidRightToken && isJoinToken(right);
// TODO: if we didn't care about the column order, we could switch join sides here
// for TOK_JOIN and TOK_FULLOUTERJOIN.
if (!isValidLeftToken && !isJoinLeftToken) {
throw new SemanticException("Invalid token on the left side of the join: "
+ left.getToken().getText() + "; please rewrite your query");
} else if (!isValidRightToken) {
String advice= "";
if (isJoinRightToken && !isJoinLeftToken) {
advice = "; for example, put the nested join on the left side, or nest joins differently";
} else if (isJoinRightToken) {
advice = "; for example, nest joins differently";
}
throw new SemanticException("Invalid token on the right side of the join: "
+ right.getToken().getText() + "; please rewrite your query" + advice);
}
{code}
{{genJoinTree}} does not expect it's right input to be another join
{code}
private static boolean isValidJoinSide(ASTNode right) {
return (right.getToken().getType() == HiveParser.TOK_TABREF)
|| (right.getToken().getType() == HiveParser.TOK_SUBQUERY)
|| (right.getToken().getType() == HiveParser.TOK_PTBLFUNCTION);
}
{code}
> Fix semi-join conversion rule for subquery
> ------------------------------------------
>
> Key: HIVE-15458
> URL: https://issues.apache.org/jira/browse/HIVE-15458
> Project: Hive
> Issue Type: Sub-task
> Components: Logical Optimizer
> Reporter: Vineet Garg
> Assignee: Vineet Garg
>
> Subquery code in *CalcitePlanner* turns off *hive.enable.semijoin.conversion* since it doesn't work for subqueries.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)