You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2021/05/20 15:46:05 UTC

[GitHub] [hive] soumyakanti3578 opened a new pull request #2302: [HIVE-25090] Join condition parsing error in subquery

soumyakanti3578 opened a new pull request #2302:
URL: https://github.com/apache/hive/pull/2302


   This is a draft PR to resolve parsing errors in subquery with correlated join conditions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jcamachor merged pull request #2302: [HIVE-25183] Inner Join condition parsing error in subquery

Posted by GitBox <gi...@apache.org>.
jcamachor merged pull request #2302:
URL: https://github.com/apache/hive/pull/2302


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jcamachor commented on a change in pull request #2302: [HIVE-25183] Inner Join condition parsing error in subquery

Posted by GitBox <gi...@apache.org>.
jcamachor commented on a change in pull request #2302:
URL: https://github.com/apache/hive/pull/2302#discussion_r643653073



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/type/JoinCondTypeCheckProcFactory.java
##########
@@ -104,12 +105,20 @@ private boolean hasTableAlias(JoinTypeCheckCtx ctx, String tabName, ASTNode expr
           tblAliasCnt++;
       }
 
+      if (tblAliasCnt == 0 && ctx.getOuterRR() != null) {

Review comment:
       Why do we do this check? Maybe add a comment to the code.

##########
File path: ql/src/test/queries/clientpositive/subquery_corr_join.q
##########
@@ -0,0 +1,69 @@
+create table alltypestiny(
+id int,
+int_col int,
+bigint_col bigint,
+bool_col boolean
+);
+
+insert into alltypestiny(id, int_col, bigint_col, bool_col) values
+(1, 1, 10, true),
+(2, 4, 5, false),
+(3, 5, 15, true),
+(10, 10, 30, false);
+
+create table alltypesagg(
+id int,
+int_col int,
+bool_col boolean
+);
+
+insert into alltypesagg(id, int_col, bool_col) values
+(1, 1, true),
+(2, 4, false),
+(5, 6, true),
+(null, null, false);
+
+select *
+from alltypesagg t1
+where t1.id not in
+    (select tt1.id
+     from alltypestiny tt1 inner JOIN alltypesagg tt2

Review comment:
       Can we add a q file with a negative test for outer joins? That will be useful to make sure that the query will fail for the time being, as expected.

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterJoinRule.java
##########
@@ -90,6 +90,36 @@ public boolean matches(RelOptRuleCall call) {
 
   }
 
+  /**
+   * Rule that tries to push join conditions into its inputs
+   */
+  public static class HiveJoinConditionPushRule extends HiveFilterJoinRule {

Review comment:
       Isn't this the same as `HiveFilterJoinTransposeRule`? It should not be necessary.

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelDecorrelator.java
##########
@@ -34,6 +34,7 @@
 
 import javax.annotation.Nonnull;
 
+import org.apache.calcite.adapter.enumerable.EnumerableConvention;

Review comment:
       This does not seem needed?

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/type/JoinCondTypeCheckProcFactory.java
##########
@@ -194,6 +207,19 @@ private ColumnInfo getColInfo(JoinTypeCheckCtx ctx, String tabName, String colAl
         }
       }
 
+      if (cInfoToRet == null && ctx.getOuterRR() != null) {
+        for (RowResolver rr : ImmutableList.of(ctx.getOuterRR())) {

Review comment:
       Is the `ImmutableList.of` wrapping needed?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] soumyakanti3578 commented on a change in pull request #2302: [HIVE-25183] Inner Join condition parsing error in subquery

Posted by GitBox <gi...@apache.org>.
soumyakanti3578 commented on a change in pull request #2302:
URL: https://github.com/apache/hive/pull/2302#discussion_r644341555



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterJoinRule.java
##########
@@ -90,6 +90,36 @@ public boolean matches(RelOptRuleCall call) {
 
   }
 
+  /**
+   * Rule that tries to push join conditions into its inputs
+   */
+  public static class HiveJoinConditionPushRule extends HiveFilterJoinRule {

Review comment:
       Fixed!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jcamachor commented on a change in pull request #2302: [HIVE-25183] Inner Join condition parsing error in subquery

Posted by GitBox <gi...@apache.org>.
jcamachor commented on a change in pull request #2302:
URL: https://github.com/apache/hive/pull/2302#discussion_r644425512



##########
File path: ql/src/test/results/clientnegative/subquery_corr_outer.q.out
##########
@@ -0,0 +1,74 @@
+PREHOOK: query: create table alltypestiny(
+id int,
+int_col int,
+bigint_col bigint,
+bool_col boolean
+)
+PREHOOK: type: CREATETABLE
+PREHOOK: Output: database:default
+PREHOOK: Output: default@alltypestiny
+POSTHOOK: query: create table alltypestiny(
+id int,
+int_col int,
+bigint_col bigint,
+bool_col boolean
+)
+POSTHOOK: type: CREATETABLE
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@alltypestiny
+PREHOOK: query: insert into alltypestiny(id, int_col, bigint_col, bool_col) values
+(1, 1, 10, true),
+(2, 4, 5, false),
+(3, 5, 15, true),
+(10, 10, 30, false)
+PREHOOK: type: QUERY
+PREHOOK: Input: _dummy_database@_dummy_table
+PREHOOK: Output: default@alltypestiny
+POSTHOOK: query: insert into alltypestiny(id, int_col, bigint_col, bool_col) values
+(1, 1, 10, true),
+(2, 4, 5, false),
+(3, 5, 15, true),
+(10, 10, 30, false)
+POSTHOOK: type: QUERY
+POSTHOOK: Input: _dummy_database@_dummy_table
+POSTHOOK: Output: default@alltypestiny
+POSTHOOK: Lineage: alltypestiny.bigint_col SCRIPT []
+POSTHOOK: Lineage: alltypestiny.bool_col SCRIPT []
+POSTHOOK: Lineage: alltypestiny.id SCRIPT []
+POSTHOOK: Lineage: alltypestiny.int_col SCRIPT []
+PREHOOK: query: create table alltypesagg(
+id int,
+int_col int,
+bool_col boolean
+)
+PREHOOK: type: CREATETABLE
+PREHOOK: Output: database:default
+PREHOOK: Output: default@alltypesagg
+POSTHOOK: query: create table alltypesagg(
+id int,
+int_col int,
+bool_col boolean
+)
+POSTHOOK: type: CREATETABLE
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@alltypesagg
+PREHOOK: query: insert into alltypesagg(id, int_col, bool_col) values
+(1, 1, true),
+(2, 4, false),
+(5, 6, true),
+(null, null, false)
+PREHOOK: type: QUERY
+PREHOOK: Input: _dummy_database@_dummy_table
+PREHOOK: Output: default@alltypesagg
+POSTHOOK: query: insert into alltypesagg(id, int_col, bool_col) values
+(1, 1, true),
+(2, 4, false),
+(5, 6, true),
+(null, null, false)
+POSTHOOK: type: QUERY
+POSTHOOK: Input: _dummy_database@_dummy_table
+POSTHOOK: Output: default@alltypesagg
+POSTHOOK: Lineage: alltypesagg.bool_col SCRIPT []
+POSTHOOK: Lineage: alltypesagg.id SCRIPT []
+POSTHOOK: Lineage: alltypesagg.int_col SCRIPT []
+FAILED: IndexOutOfBoundsException Index: 2, Size: 2

Review comment:
       This error is not very descriptive. We should be proactive throwing an error 'Correlated subqueries in outer join conditions not supported yet' or anything along those lines. This can probably done either in the parsing logic, decorrelation logic, etc., it's up to you.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] soumyakanti3578 commented on a change in pull request #2302: [HIVE-25183] Inner Join condition parsing error in subquery

Posted by GitBox <gi...@apache.org>.
soumyakanti3578 commented on a change in pull request #2302:
URL: https://github.com/apache/hive/pull/2302#discussion_r644342211



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/type/JoinCondTypeCheckProcFactory.java
##########
@@ -194,6 +207,19 @@ private ColumnInfo getColInfo(JoinTypeCheckCtx ctx, String tabName, String colAl
         }
       }
 
+      if (cInfoToRet == null && ctx.getOuterRR() != null) {
+        for (RowResolver rr : ImmutableList.of(ctx.getOuterRR())) {

Review comment:
       I cleaned this a bit now




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jcamachor commented on a change in pull request #2302: [HIVE-25090] Join condition parsing error in subquery

Posted by GitBox <gi...@apache.org>.
jcamachor commented on a change in pull request #2302:
URL: https://github.com/apache/hive/pull/2302#discussion_r638272749



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinExtractFilterRule.java
##########
@@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptRuleOperand;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.Join;
+import org.apache.calcite.tools.RelBuilder;
+import org.apache.calcite.tools.RelBuilderFactory;
+
+public class HiveJoinExtractFilterRule extends RelOptRule {
+  public HiveJoinExtractFilterRule(Class<? extends Join> clazz,
+                                   RelBuilderFactory relBuilderFactory) {
+    super(operand(clazz, any()), relBuilderFactory, null);
+  }
+
+  @Override
+  public void onMatch(RelOptRuleCall call) {
+    final Join join = call.rel(0);

Review comment:
       Shouldn't you check the type of join? If this is a (left/right/full) outer join, JOIN with condition will not be equivalent to cartesian product with a filter with the same condition on top. In SQL:
   ```
   p1 LEFT OUTER JOIN p2 ON (p1.a = p2.b) != p1 LEFT OUTER JOIN p2 WHERE (p1.a = p2.b)
   ```

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveFilter.java
##########
@@ -103,9 +104,11 @@ private static void traverseFilter(RexNode node, Set<CorrelationId> allVars) {
           RelNode input = ((RexSubQuery)node).rel.getInput(0);
           while(input != null && !(input instanceof HiveFilter)
                   && input.getInputs().size() >=1) {
-              //we don't expect corr vars withing JOIN or UNION for now
-              // we only expect cor vars in top level filter
+              //we don't expect corr vars within UNION for now
               if(input.getInputs().size() > 1) {
+                if (input instanceof HiveJoin) {
+                  findCorrelatedVar(((HiveJoin) input).getJoinFilter(), allVars);
+                }
                   return;

Review comment:
       nit. fix indentation




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] soumyakanti3578 commented on a change in pull request #2302: [HIVE-25183] Inner Join condition parsing error in subquery

Posted by GitBox <gi...@apache.org>.
soumyakanti3578 commented on a change in pull request #2302:
URL: https://github.com/apache/hive/pull/2302#discussion_r644341372



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelDecorrelator.java
##########
@@ -34,6 +34,7 @@
 
 import javax.annotation.Nonnull;
 
+import org.apache.calcite.adapter.enumerable.EnumerableConvention;

Review comment:
       True, I missed that!

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterJoinRule.java
##########
@@ -90,6 +90,36 @@ public boolean matches(RelOptRuleCall call) {
 
   }
 
+  /**
+   * Rule that tries to push join conditions into its inputs
+   */
+  public static class HiveJoinConditionPushRule extends HiveFilterJoinRule {

Review comment:
       Fixed!

##########
File path: ql/src/test/queries/clientpositive/subquery_corr_join.q
##########
@@ -0,0 +1,69 @@
+create table alltypestiny(
+id int,
+int_col int,
+bigint_col bigint,
+bool_col boolean
+);
+
+insert into alltypestiny(id, int_col, bigint_col, bool_col) values
+(1, 1, 10, true),
+(2, 4, 5, false),
+(3, 5, 15, true),
+(10, 10, 30, false);
+
+create table alltypesagg(
+id int,
+int_col int,
+bool_col boolean
+);
+
+insert into alltypesagg(id, int_col, bool_col) values
+(1, 1, true),
+(2, 4, false),
+(5, 6, true),
+(null, null, false);
+
+select *
+from alltypesagg t1
+where t1.id not in
+    (select tt1.id
+     from alltypestiny tt1 inner JOIN alltypesagg tt2

Review comment:
       Added `subquery_corr_outer.q` to test LEFT JOIN

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/type/JoinCondTypeCheckProcFactory.java
##########
@@ -194,6 +207,19 @@ private ColumnInfo getColInfo(JoinTypeCheckCtx ctx, String tabName, String colAl
         }
       }
 
+      if (cInfoToRet == null && ctx.getOuterRR() != null) {
+        for (RowResolver rr : ImmutableList.of(ctx.getOuterRR())) {

Review comment:
       I cleaned this a bit now

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/type/JoinCondTypeCheckProcFactory.java
##########
@@ -104,12 +105,20 @@ private boolean hasTableAlias(JoinTypeCheckCtx ctx, String tabName, ASTNode expr
           tblAliasCnt++;
       }
 
+      if (tblAliasCnt == 0 && ctx.getOuterRR() != null) {

Review comment:
       Added a comment, and cleaned up the code here




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] soumyakanti3578 commented on a change in pull request #2302: [HIVE-25183] Inner Join condition parsing error in subquery

Posted by GitBox <gi...@apache.org>.
soumyakanti3578 commented on a change in pull request #2302:
URL: https://github.com/apache/hive/pull/2302#discussion_r644341873



##########
File path: ql/src/test/queries/clientpositive/subquery_corr_join.q
##########
@@ -0,0 +1,69 @@
+create table alltypestiny(
+id int,
+int_col int,
+bigint_col bigint,
+bool_col boolean
+);
+
+insert into alltypestiny(id, int_col, bigint_col, bool_col) values
+(1, 1, 10, true),
+(2, 4, 5, false),
+(3, 5, 15, true),
+(10, 10, 30, false);
+
+create table alltypesagg(
+id int,
+int_col int,
+bool_col boolean
+);
+
+insert into alltypesagg(id, int_col, bool_col) values
+(1, 1, true),
+(2, 4, false),
+(5, 6, true),
+(null, null, false);
+
+select *
+from alltypesagg t1
+where t1.id not in
+    (select tt1.id
+     from alltypestiny tt1 inner JOIN alltypesagg tt2

Review comment:
       Added `subquery_corr_outer.q` to test LEFT JOIN




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] soumyakanti3578 commented on a change in pull request #2302: [HIVE-25090] Join condition parsing error in subquery

Posted by GitBox <gi...@apache.org>.
soumyakanti3578 commented on a change in pull request #2302:
URL: https://github.com/apache/hive/pull/2302#discussion_r636302618



##########
File path: ql/src/test/results/clientpositive/llap/subquery_corr_join.q.out
##########
@@ -0,0 +1,212 @@
+PREHOOK: query: create table alltypestiny(
+id int,
+int_col int,
+bigint_col bigint,
+bool_col boolean
+)
+PREHOOK: type: CREATETABLE
+PREHOOK: Output: database:default
+PREHOOK: Output: default@alltypestiny
+POSTHOOK: query: create table alltypestiny(
+id int,
+int_col int,
+bigint_col bigint,
+bool_col boolean
+)
+POSTHOOK: type: CREATETABLE
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@alltypestiny
+PREHOOK: query: insert into alltypestiny(id, int_col, bigint_col, bool_col) values
+(1, 1, 10, true),
+(2, 4, 5, false),
+(3, 5, 15, true),
+(10, 10, 30, false)
+PREHOOK: type: QUERY
+PREHOOK: Input: _dummy_database@_dummy_table
+PREHOOK: Output: default@alltypestiny
+POSTHOOK: query: insert into alltypestiny(id, int_col, bigint_col, bool_col) values
+(1, 1, 10, true),
+(2, 4, 5, false),
+(3, 5, 15, true),
+(10, 10, 30, false)
+POSTHOOK: type: QUERY
+POSTHOOK: Input: _dummy_database@_dummy_table
+POSTHOOK: Output: default@alltypestiny
+POSTHOOK: Lineage: alltypestiny.bigint_col SCRIPT []
+POSTHOOK: Lineage: alltypestiny.bool_col SCRIPT []
+POSTHOOK: Lineage: alltypestiny.id SCRIPT []
+POSTHOOK: Lineage: alltypestiny.int_col SCRIPT []
+PREHOOK: query: create table alltypesagg(
+id int,
+int_col int,
+bool_col boolean
+)
+PREHOOK: type: CREATETABLE
+PREHOOK: Output: database:default
+PREHOOK: Output: default@alltypesagg
+POSTHOOK: query: create table alltypesagg(
+id int,
+int_col int,
+bool_col boolean
+)
+POSTHOOK: type: CREATETABLE
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@alltypesagg
+PREHOOK: query: insert into alltypesagg(id, int_col, bool_col) values
+(1, 1, true),
+(2, 4, false),
+(5, 6, true),
+(null, null, false)
+PREHOOK: type: QUERY
+PREHOOK: Input: _dummy_database@_dummy_table
+PREHOOK: Output: default@alltypesagg
+POSTHOOK: query: insert into alltypesagg(id, int_col, bool_col) values
+(1, 1, true),
+(2, 4, false),
+(5, 6, true),
+(null, null, false)
+POSTHOOK: type: QUERY
+POSTHOOK: Input: _dummy_database@_dummy_table
+POSTHOOK: Output: default@alltypesagg
+POSTHOOK: Lineage: alltypesagg.bool_col SCRIPT []
+POSTHOOK: Lineage: alltypesagg.id SCRIPT []
+POSTHOOK: Lineage: alltypesagg.int_col SCRIPT []
+Warning: Shuffle Join MERGEJOIN[64][tables = [$hdt$_0, $hdt$_1, $hdt$_2]] in Stage 'Reducer 3' is a cross product
+PREHOOK: query: explain cbo select *
+from alltypesagg t1
+where t1.id not in
+    (select tt1.id
+     from alltypestiny tt1 left JOIN alltypesagg tt2
+     on tt1.int_col = tt2.int_col)
+PREHOOK: type: QUERY
+PREHOOK: Input: default@alltypesagg
+PREHOOK: Input: default@alltypestiny
+#### A masked pattern was here ####
+POSTHOOK: query: explain cbo select *
+from alltypesagg t1
+where t1.id not in
+    (select tt1.id
+     from alltypestiny tt1 left JOIN alltypesagg tt2
+     on tt1.int_col = tt2.int_col)
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@alltypesagg
+POSTHOOK: Input: default@alltypestiny
+#### A masked pattern was here ####
+CBO PLAN:
+HiveProject(id=[$0], int_col=[$1], bool_col=[$2])
+  HiveFilter(condition=[OR(=($3, 0), AND(IS NULL($6), >=($4, $3), IS NOT NULL($0)))])
+    HiveProject(id=[$0], int_col=[$1], bool_col=[$2], c=[$5], ck=[$6], id0=[$3], literalTrue=[$4])
+      HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available])
+        HiveJoin(condition=[=($0, $3)], joinType=[left], algorithm=[none], cost=[not available])
+          HiveProject(id=[$0], int_col=[$1], bool_col=[$2])
+            HiveTableScan(table=[[default, alltypesagg]], table:alias=[t1])
+          HiveProject(id=[$0], literalTrue=[true])
+            HiveAggregate(group=[{0}])
+              HiveJoin(condition=[=($1, $2)], joinType=[left], algorithm=[none], cost=[not available])
+                HiveProject(id=[$0], int_col=[$1])
+                  HiveFilter(condition=[IS NOT NULL($0)])
+                    HiveTableScan(table=[[default, alltypestiny]], table:alias=[tt1])
+                HiveProject(int_col=[$1])
+                  HiveFilter(condition=[IS NOT NULL($1)])
+                    HiveTableScan(table=[[default, alltypesagg]], table:alias=[tt2])
+        HiveProject(c=[$0], ck=[$1])
+          HiveAggregate(group=[{}], c=[COUNT()], ck=[COUNT($0)])
+            HiveJoin(condition=[=($1, $2)], joinType=[left], algorithm=[none], cost=[not available])
+              HiveProject(id=[$0], int_col=[$1])
+                HiveTableScan(table=[[default, alltypestiny]], table:alias=[tt1])
+              HiveProject(int_col=[$1])
+                HiveFilter(condition=[IS NOT NULL($1)])
+                  HiveTableScan(table=[[default, alltypesagg]], table:alias=[tt2])
+
+Warning: Shuffle Join MERGEJOIN[64][tables = [$hdt$_0, $hdt$_1, $hdt$_2]] in Stage 'Reducer 3' is a cross product
+PREHOOK: query: select *
+from alltypesagg t1
+where t1.id not in
+    (select tt1.id
+     from alltypestiny tt1 left JOIN alltypesagg tt2
+     on tt1.int_col = tt2.int_col)
+PREHOOK: type: QUERY
+PREHOOK: Input: default@alltypesagg
+PREHOOK: Input: default@alltypestiny
+#### A masked pattern was here ####
+POSTHOOK: query: select *
+from alltypesagg t1
+where t1.id not in
+    (select tt1.id
+     from alltypestiny tt1 left JOIN alltypesagg tt2
+     on tt1.int_col = tt2.int_col)
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@alltypesagg
+POSTHOOK: Input: default@alltypestiny
+#### A masked pattern was here ####
+5	6	true

Review comment:
       This is correct.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] soumyakanti3578 commented on a change in pull request #2302: [HIVE-25183] Inner Join condition parsing error in subquery

Posted by GitBox <gi...@apache.org>.
soumyakanti3578 commented on a change in pull request #2302:
URL: https://github.com/apache/hive/pull/2302#discussion_r644342371



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/type/JoinCondTypeCheckProcFactory.java
##########
@@ -104,12 +105,20 @@ private boolean hasTableAlias(JoinTypeCheckCtx ctx, String tabName, ASTNode expr
           tblAliasCnt++;
       }
 
+      if (tblAliasCnt == 0 && ctx.getOuterRR() != null) {

Review comment:
       Added a comment, and cleaned up the code here




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jcamachor commented on a change in pull request #2302: [HIVE-25183] Inner Join condition parsing error in subquery

Posted by GitBox <gi...@apache.org>.
jcamachor commented on a change in pull request #2302:
URL: https://github.com/apache/hive/pull/2302#discussion_r644425512



##########
File path: ql/src/test/results/clientnegative/subquery_corr_outer.q.out
##########
@@ -0,0 +1,74 @@
+PREHOOK: query: create table alltypestiny(
+id int,
+int_col int,
+bigint_col bigint,
+bool_col boolean
+)
+PREHOOK: type: CREATETABLE
+PREHOOK: Output: database:default
+PREHOOK: Output: default@alltypestiny
+POSTHOOK: query: create table alltypestiny(
+id int,
+int_col int,
+bigint_col bigint,
+bool_col boolean
+)
+POSTHOOK: type: CREATETABLE
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@alltypestiny
+PREHOOK: query: insert into alltypestiny(id, int_col, bigint_col, bool_col) values
+(1, 1, 10, true),
+(2, 4, 5, false),
+(3, 5, 15, true),
+(10, 10, 30, false)
+PREHOOK: type: QUERY
+PREHOOK: Input: _dummy_database@_dummy_table
+PREHOOK: Output: default@alltypestiny
+POSTHOOK: query: insert into alltypestiny(id, int_col, bigint_col, bool_col) values
+(1, 1, 10, true),
+(2, 4, 5, false),
+(3, 5, 15, true),
+(10, 10, 30, false)
+POSTHOOK: type: QUERY
+POSTHOOK: Input: _dummy_database@_dummy_table
+POSTHOOK: Output: default@alltypestiny
+POSTHOOK: Lineage: alltypestiny.bigint_col SCRIPT []
+POSTHOOK: Lineage: alltypestiny.bool_col SCRIPT []
+POSTHOOK: Lineage: alltypestiny.id SCRIPT []
+POSTHOOK: Lineage: alltypestiny.int_col SCRIPT []
+PREHOOK: query: create table alltypesagg(
+id int,
+int_col int,
+bool_col boolean
+)
+PREHOOK: type: CREATETABLE
+PREHOOK: Output: database:default
+PREHOOK: Output: default@alltypesagg
+POSTHOOK: query: create table alltypesagg(
+id int,
+int_col int,
+bool_col boolean
+)
+POSTHOOK: type: CREATETABLE
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@alltypesagg
+PREHOOK: query: insert into alltypesagg(id, int_col, bool_col) values
+(1, 1, true),
+(2, 4, false),
+(5, 6, true),
+(null, null, false)
+PREHOOK: type: QUERY
+PREHOOK: Input: _dummy_database@_dummy_table
+PREHOOK: Output: default@alltypesagg
+POSTHOOK: query: insert into alltypesagg(id, int_col, bool_col) values
+(1, 1, true),
+(2, 4, false),
+(5, 6, true),
+(null, null, false)
+POSTHOOK: type: QUERY
+POSTHOOK: Input: _dummy_database@_dummy_table
+POSTHOOK: Output: default@alltypesagg
+POSTHOOK: Lineage: alltypesagg.bool_col SCRIPT []
+POSTHOOK: Lineage: alltypesagg.id SCRIPT []
+POSTHOOK: Lineage: alltypesagg.int_col SCRIPT []
+FAILED: IndexOutOfBoundsException Index: 2, Size: 2

Review comment:
       This error is not very descriptive. We should be proactive throwing an error 'Correlated subqueries in outer join conditions not supported yet' or anything along those lines. This can probably done either in the parsing logic, decorrelation logic, etc., it's up to you.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] soumyakanti3578 commented on a change in pull request #2302: [HIVE-25090] Join condition parsing error in subquery

Posted by GitBox <gi...@apache.org>.
soumyakanti3578 commented on a change in pull request #2302:
URL: https://github.com/apache/hive/pull/2302#discussion_r636287623



##########
File path: ql/src/test/results/clientpositive/llap/subquery_corr_join.q.out
##########
@@ -0,0 +1,212 @@
+PREHOOK: query: create table alltypestiny(
+id int,
+int_col int,
+bigint_col bigint,
+bool_col boolean
+)
+PREHOOK: type: CREATETABLE
+PREHOOK: Output: database:default
+PREHOOK: Output: default@alltypestiny
+POSTHOOK: query: create table alltypestiny(
+id int,
+int_col int,
+bigint_col bigint,
+bool_col boolean
+)
+POSTHOOK: type: CREATETABLE
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@alltypestiny
+PREHOOK: query: insert into alltypestiny(id, int_col, bigint_col, bool_col) values
+(1, 1, 10, true),
+(2, 4, 5, false),
+(3, 5, 15, true),
+(10, 10, 30, false)
+PREHOOK: type: QUERY
+PREHOOK: Input: _dummy_database@_dummy_table
+PREHOOK: Output: default@alltypestiny
+POSTHOOK: query: insert into alltypestiny(id, int_col, bigint_col, bool_col) values
+(1, 1, 10, true),
+(2, 4, 5, false),
+(3, 5, 15, true),
+(10, 10, 30, false)
+POSTHOOK: type: QUERY
+POSTHOOK: Input: _dummy_database@_dummy_table
+POSTHOOK: Output: default@alltypestiny
+POSTHOOK: Lineage: alltypestiny.bigint_col SCRIPT []
+POSTHOOK: Lineage: alltypestiny.bool_col SCRIPT []
+POSTHOOK: Lineage: alltypestiny.id SCRIPT []
+POSTHOOK: Lineage: alltypestiny.int_col SCRIPT []
+PREHOOK: query: create table alltypesagg(
+id int,
+int_col int,
+bool_col boolean
+)
+PREHOOK: type: CREATETABLE
+PREHOOK: Output: database:default
+PREHOOK: Output: default@alltypesagg
+POSTHOOK: query: create table alltypesagg(
+id int,
+int_col int,
+bool_col boolean
+)
+POSTHOOK: type: CREATETABLE
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@alltypesagg
+PREHOOK: query: insert into alltypesagg(id, int_col, bool_col) values
+(1, 1, true),
+(2, 4, false),
+(5, 6, true),
+(null, null, false)
+PREHOOK: type: QUERY
+PREHOOK: Input: _dummy_database@_dummy_table
+PREHOOK: Output: default@alltypesagg
+POSTHOOK: query: insert into alltypesagg(id, int_col, bool_col) values
+(1, 1, true),
+(2, 4, false),
+(5, 6, true),
+(null, null, false)
+POSTHOOK: type: QUERY
+POSTHOOK: Input: _dummy_database@_dummy_table
+POSTHOOK: Output: default@alltypesagg
+POSTHOOK: Lineage: alltypesagg.bool_col SCRIPT []
+POSTHOOK: Lineage: alltypesagg.id SCRIPT []
+POSTHOOK: Lineage: alltypesagg.int_col SCRIPT []
+Warning: Shuffle Join MERGEJOIN[64][tables = [$hdt$_0, $hdt$_1, $hdt$_2]] in Stage 'Reducer 3' is a cross product
+PREHOOK: query: explain cbo select *
+from alltypesagg t1
+where t1.id not in
+    (select tt1.id
+     from alltypestiny tt1 left JOIN alltypesagg tt2
+     on tt1.int_col = tt2.int_col)
+PREHOOK: type: QUERY
+PREHOOK: Input: default@alltypesagg
+PREHOOK: Input: default@alltypestiny
+#### A masked pattern was here ####
+POSTHOOK: query: explain cbo select *
+from alltypesagg t1
+where t1.id not in
+    (select tt1.id
+     from alltypestiny tt1 left JOIN alltypesagg tt2
+     on tt1.int_col = tt2.int_col)
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@alltypesagg
+POSTHOOK: Input: default@alltypestiny
+#### A masked pattern was here ####
+CBO PLAN:
+HiveProject(id=[$0], int_col=[$1], bool_col=[$2])
+  HiveFilter(condition=[OR(=($3, 0), AND(IS NULL($6), >=($4, $3), IS NOT NULL($0)))])
+    HiveProject(id=[$0], int_col=[$1], bool_col=[$2], c=[$5], ck=[$6], id0=[$3], literalTrue=[$4])
+      HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available])
+        HiveJoin(condition=[=($0, $3)], joinType=[left], algorithm=[none], cost=[not available])
+          HiveProject(id=[$0], int_col=[$1], bool_col=[$2])
+            HiveTableScan(table=[[default, alltypesagg]], table:alias=[t1])
+          HiveProject(id=[$0], literalTrue=[true])
+            HiveAggregate(group=[{0}])
+              HiveJoin(condition=[=($1, $2)], joinType=[left], algorithm=[none], cost=[not available])
+                HiveProject(id=[$0], int_col=[$1])
+                  HiveFilter(condition=[IS NOT NULL($0)])
+                    HiveTableScan(table=[[default, alltypestiny]], table:alias=[tt1])
+                HiveProject(int_col=[$1])
+                  HiveFilter(condition=[IS NOT NULL($1)])
+                    HiveTableScan(table=[[default, alltypesagg]], table:alias=[tt2])
+        HiveProject(c=[$0], ck=[$1])
+          HiveAggregate(group=[{}], c=[COUNT()], ck=[COUNT($0)])
+            HiveJoin(condition=[=($1, $2)], joinType=[left], algorithm=[none], cost=[not available])
+              HiveProject(id=[$0], int_col=[$1])
+                HiveTableScan(table=[[default, alltypestiny]], table:alias=[tt1])
+              HiveProject(int_col=[$1])
+                HiveFilter(condition=[IS NOT NULL($1)])
+                  HiveTableScan(table=[[default, alltypesagg]], table:alias=[tt2])
+
+Warning: Shuffle Join MERGEJOIN[64][tables = [$hdt$_0, $hdt$_1, $hdt$_2]] in Stage 'Reducer 3' is a cross product
+PREHOOK: query: select *
+from alltypesagg t1
+where t1.id not in
+    (select tt1.id
+     from alltypestiny tt1 left JOIN alltypesagg tt2
+     on tt1.int_col = tt2.int_col)
+PREHOOK: type: QUERY
+PREHOOK: Input: default@alltypesagg
+PREHOOK: Input: default@alltypestiny
+#### A masked pattern was here ####
+POSTHOOK: query: select *
+from alltypesagg t1
+where t1.id not in
+    (select tt1.id
+     from alltypestiny tt1 left JOIN alltypesagg tt2
+     on tt1.int_col = tt2.int_col)
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@alltypesagg
+POSTHOOK: Input: default@alltypestiny
+#### A masked pattern was here ####
+5	6	true
+Warning: Shuffle Join MERGEJOIN[63][tables = [$hdt$_1, $hdt$_2]] in Stage 'Reducer 6' is a cross product
+Warning: Shuffle Join MERGEJOIN[65][tables = [$hdt$_2, $hdt$_3]] in Stage 'Reducer 4' is a cross product
+PREHOOK: query: explain cbo select *
+from alltypesagg t1
+where t1.id not in
+    (select tt1.id
+     from alltypestiny tt1 left JOIN alltypesagg tt2
+     on tt1.int_col = t1.int_col)
+PREHOOK: type: QUERY
+PREHOOK: Input: default@alltypesagg
+PREHOOK: Input: default@alltypestiny
+#### A masked pattern was here ####
+POSTHOOK: query: explain cbo select *
+from alltypesagg t1
+where t1.id not in
+    (select tt1.id
+     from alltypestiny tt1 left JOIN alltypesagg tt2
+     on tt1.int_col = t1.int_col)
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@alltypesagg
+POSTHOOK: Input: default@alltypestiny
+#### A masked pattern was here ####
+CBO PLAN:
+HiveProject(id=[$0], int_col=[$1], bool_col=[$2])
+  HiveFilter(condition=[OR(IS NULL($4), =($4, 0), IS NOT TRUE(OR(IS NOT NULL($7), IS NULL($0), <($5, $4))))])
+    HiveJoin(condition=[AND(=($0, $6), =($8, $1))], joinType=[left], algorithm=[none], cost=[not available])
+      HiveJoin(condition=[=($3, $1)], joinType=[left], algorithm=[none], cost=[not available])
+        HiveProject(id=[$0], int_col=[$1], bool_col=[$2])
+          HiveTableScan(table=[[default, alltypesagg]], table:alias=[t1])
+        HiveProject(int_col=[$0], c=[$1], ck=[$2])
+          HiveAggregate(group=[{1}], c=[COUNT()], ck=[COUNT($0)])
+            HiveJoin(condition=[true], joinType=[left], algorithm=[none], cost=[not available])
+              HiveProject(id=[$0], int_col=[$1])
+                HiveFilter(condition=[IS NOT NULL($1)])
+                  HiveTableScan(table=[[default, alltypestiny]], table:alias=[tt1])
+              HiveProject(DUMMY=[0])
+                HiveTableScan(table=[[default, alltypesagg]], table:alias=[tt2])
+      HiveProject(id=[$0], literalTrue=[true], int_col=[$1])
+        HiveAggregate(group=[{0, 1}])
+          HiveJoin(condition=[true], joinType=[left], algorithm=[none], cost=[not available])
+            HiveProject(id=[$0], int_col=[$1])
+              HiveFilter(condition=[AND(IS NOT NULL($1), IS NOT NULL($0))])
+                HiveTableScan(table=[[default, alltypestiny]], table:alias=[tt1])
+            HiveProject(DUMMY=[0])
+              HiveTableScan(table=[[default, alltypesagg]], table:alias=[tt2])
+
+Warning: Shuffle Join MERGEJOIN[63][tables = [$hdt$_1, $hdt$_2]] in Stage 'Reducer 6' is a cross product
+Warning: Shuffle Join MERGEJOIN[65][tables = [$hdt$_2, $hdt$_3]] in Stage 'Reducer 4' is a cross product
+PREHOOK: query: select *
+from alltypesagg t1
+where t1.id not in
+    (select tt1.id
+     from alltypestiny tt1 left JOIN alltypesagg tt2
+     on tt1.int_col = t1.int_col)
+PREHOOK: type: QUERY
+PREHOOK: Input: default@alltypesagg
+PREHOOK: Input: default@alltypestiny
+#### A masked pattern was here ####
+POSTHOOK: query: select *
+from alltypesagg t1
+where t1.id not in
+    (select tt1.id
+     from alltypestiny tt1 left JOIN alltypesagg tt2
+     on tt1.int_col = t1.int_col)
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@alltypesagg
+POSTHOOK: Input: default@alltypestiny
+#### A masked pattern was here ####
+5	6	true
+NULL	NULL	false

Review comment:
       This row should not be there in the output. Checked on Postgres, and also the uncorrelated query doesn't output this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] soumyakanti3578 commented on a change in pull request #2302: [HIVE-25183] Inner Join condition parsing error in subquery

Posted by GitBox <gi...@apache.org>.
soumyakanti3578 commented on a change in pull request #2302:
URL: https://github.com/apache/hive/pull/2302#discussion_r644341372



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelDecorrelator.java
##########
@@ -34,6 +34,7 @@
 
 import javax.annotation.Nonnull;
 
+import org.apache.calcite.adapter.enumerable.EnumerableConvention;

Review comment:
       True, I missed that!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org