You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2020/07/20 19:30:43 UTC

[GitHub] [hive] pgaref opened a new pull request #1286: HIVE-23882: Compiler should skip MJ keyExpr for probe optimization

pgaref opened a new pull request #1286:
URL: https://github.com/apache/hive/pull/1286


   Change-Id: I1033a65f26592ef3683b7aa0a669e0c378667278
   
   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HIVE-XXXXX: Fix a typo in YYY)
   For more details, please see https://cwiki.apache.org/confluence/display/Hive/HowToContribute
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] github-actions[bot] commented on pull request #1286: HIVE-23882: Compiler should skip MJ keyExpr for probe optimization

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #1286:
URL: https://github.com/apache/hive/pull/1286#issuecomment-695860126


   This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the dev@hive.apache.org list if the patch is in need of reviews.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk commented on pull request #1286: HIVE-23882: Compiler should skip MJ keyExpr for probe optimization

Posted by GitBox <gi...@apache.org>.
kgyrtkirk commented on pull request #1286:
URL: https://github.com/apache/hive/pull/1286#issuecomment-781469495


   @pgaref I think it would be usefull to wait #1929  and see if this patch passes with the added check as well 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pgaref commented on a change in pull request #1286: HIVE-23882: Compiler should skip MJ keyExpr for probe optimization

Posted by GitBox <gi...@apache.org>.
pgaref commented on a change in pull request #1286:
URL: https://github.com/apache/hive/pull/1286#discussion_r578641665



##########
File path: ql/src/test/results/clientpositive/llap/auto_join10.q.out
##########
@@ -57,6 +57,7 @@ STAGE PLANS:
                 TableScan
                   alias: src
                   filterExpr: key is not null (type: boolean)
+                  probeDecodeDetails: cacheKey:HASH_MAP_MAPJOIN_30_container, bigKeyColName:key, smallTablePos:0, keyRatio:1.582

Review comment:
       In general thats the case, this particular query though is a self join where tsKeyCardinality is 500 while mjKeyCardinality is 791 thus the ratio is above 1. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pgaref closed pull request #1286: HIVE-23882: Compiler should skip MJ keyExpr for probe optimization

Posted by GitBox <gi...@apache.org>.
pgaref closed pull request #1286:
URL: https://github.com/apache/hive/pull/1286


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pgaref commented on a change in pull request #1286: HIVE-23882: Compiler should skip MJ keyExpr for probe optimization

Posted by GitBox <gi...@apache.org>.
pgaref commented on a change in pull request #1286:
URL: https://github.com/apache/hive/pull/1286#discussion_r570340192



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorUtils.java
##########
@@ -654,11 +655,18 @@ public static String findTableColNameOf(Operator<?> start, String internalColNam
         continue;
       }
       // If columnName is the output of a ColumnExpr get the original columnName from the Expr Map
-      if (currentOp.getColumnExprMap() != null && currentOp.getColumnExprMap().containsKey(internalColName)
-              && currentOp.getColumnExprMap().get(internalColName) instanceof ExprNodeColumnDesc) {
-        internalColName = ((ExprNodeColumnDesc) currentOp.getColumnExprMap().get(internalColName)).getColumn();
+      if (currentOp.getColumnExprMap() != null && currentOp.getColumnExprMap().containsKey(internalColName)) {
+        // Only use colInfo that is ExprNodeColumnDesc (could even be a UDF on the key at this point)
+        if (currentOp.getColumnExprMap().get(internalColName) instanceof ExprNodeColumnDesc) {
+          internalColName = ((ExprNodeColumnDesc) currentOp.getColumnExprMap().get(internalColName)).getColumn();
+          keyColInfo = currentOp.getSchema().getColumnInfo(internalColName);

Review comment:
       Thanks for the comments @kgyrtkirk ! Updated the PR to make use of backtracking logic -- had to update it a bit to branch-out in a multi-parent scenario. Let me know what you think




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk commented on a change in pull request #1286: HIVE-23882: Compiler should skip MJ keyExpr for probe optimization

Posted by GitBox <gi...@apache.org>.
kgyrtkirk commented on a change in pull request #1286:
URL: https://github.com/apache/hive/pull/1286#discussion_r569557253



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorUtils.java
##########
@@ -654,11 +655,18 @@ public static String findTableColNameOf(Operator<?> start, String internalColNam
         continue;
       }
       // If columnName is the output of a ColumnExpr get the original columnName from the Expr Map
-      if (currentOp.getColumnExprMap() != null && currentOp.getColumnExprMap().containsKey(internalColName)
-              && currentOp.getColumnExprMap().get(internalColName) instanceof ExprNodeColumnDesc) {
-        internalColName = ((ExprNodeColumnDesc) currentOp.getColumnExprMap().get(internalColName)).getColumn();
+      if (currentOp.getColumnExprMap() != null && currentOp.getColumnExprMap().containsKey(internalColName)) {
+        // Only use colInfo that is ExprNodeColumnDesc (could even be a UDF on the key at this point)
+        if (currentOp.getColumnExprMap().get(internalColName) instanceof ExprNodeColumnDesc) {
+          internalColName = ((ExprNodeColumnDesc) currentOp.getColumnExprMap().get(internalColName)).getColumn();
+          keyColInfo = currentOp.getSchema().getColumnInfo(internalColName);

Review comment:
       this method somewhat reminds me to the `backtrack` methods we have in the `ExprNodeDescUtils` class - I don't know if you will find a perfect match ; but might worth to take a look at them




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pgaref commented on a change in pull request #1286: HIVE-23882: Compiler should skip MJ keyExpr for probe optimization

Posted by GitBox <gi...@apache.org>.
pgaref commented on a change in pull request #1286:
URL: https://github.com/apache/hive/pull/1286#discussion_r570340192



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorUtils.java
##########
@@ -654,11 +655,18 @@ public static String findTableColNameOf(Operator<?> start, String internalColNam
         continue;
       }
       // If columnName is the output of a ColumnExpr get the original columnName from the Expr Map
-      if (currentOp.getColumnExprMap() != null && currentOp.getColumnExprMap().containsKey(internalColName)
-              && currentOp.getColumnExprMap().get(internalColName) instanceof ExprNodeColumnDesc) {
-        internalColName = ((ExprNodeColumnDesc) currentOp.getColumnExprMap().get(internalColName)).getColumn();
+      if (currentOp.getColumnExprMap() != null && currentOp.getColumnExprMap().containsKey(internalColName)) {
+        // Only use colInfo that is ExprNodeColumnDesc (could even be a UDF on the key at this point)
+        if (currentOp.getColumnExprMap().get(internalColName) instanceof ExprNodeColumnDesc) {
+          internalColName = ((ExprNodeColumnDesc) currentOp.getColumnExprMap().get(internalColName)).getColumn();
+          keyColInfo = currentOp.getSchema().getColumnInfo(internalColName);

Review comment:
       Thanks for the comments @kgyrtkirk ! Updated the PR to make use of backtracking logic -- had to update it a bit to branch-out in a multi-parent scenario. Let me know what you think

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorUtils.java
##########
@@ -654,11 +655,18 @@ public static String findTableColNameOf(Operator<?> start, String internalColNam
         continue;
       }
       // If columnName is the output of a ColumnExpr get the original columnName from the Expr Map
-      if (currentOp.getColumnExprMap() != null && currentOp.getColumnExprMap().containsKey(internalColName)
-              && currentOp.getColumnExprMap().get(internalColName) instanceof ExprNodeColumnDesc) {
-        internalColName = ((ExprNodeColumnDesc) currentOp.getColumnExprMap().get(internalColName)).getColumn();
+      if (currentOp.getColumnExprMap() != null && currentOp.getColumnExprMap().containsKey(internalColName)) {
+        // Only use colInfo that is ExprNodeColumnDesc (could even be a UDF on the key at this point)
+        if (currentOp.getColumnExprMap().get(internalColName) instanceof ExprNodeColumnDesc) {
+          internalColName = ((ExprNodeColumnDesc) currentOp.getColumnExprMap().get(internalColName)).getColumn();
+          keyColInfo = currentOp.getSchema().getColumnInfo(internalColName);

Review comment:
       Thanks for the comments @kgyrtkirk ! Updated the PR to make use of backtracking logic -- had to update it a bit to branch-out in a multi-parent scenario (I believe this is the right thing to do in any case, unless you want to bail out early). Let me know what you think




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk commented on a change in pull request #1286: HIVE-23882: Compiler should skip MJ keyExpr for probe optimization

Posted by GitBox <gi...@apache.org>.
kgyrtkirk commented on a change in pull request #1286:
URL: https://github.com/apache/hive/pull/1286#discussion_r578602640



##########
File path: ql/src/test/results/clientpositive/llap/auto_join10.q.out
##########
@@ -57,6 +57,7 @@ STAGE PLANS:
                 TableScan
                   alias: src
                   filterExpr: key is not null (type: boolean)
+                  probeDecodeDetails: cacheKey:HASH_MAP_MAPJOIN_30_container, bigKeyColName:key, smallTablePos:0, keyRatio:1.582

Review comment:
       why is `keyRatio` above 1? shouldn't it mean the expected selectivity of the operation?

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java
##########
@@ -362,26 +362,26 @@ public static boolean isDeterministic(ExprNodeDesc desc) {
    */
   public static ArrayList<ExprNodeDesc> backtrack(List<ExprNodeDesc> sources,
       Operator<?> current, Operator<?> terminal) throws SemanticException {
-    return backtrack(sources, current, terminal, false);
+    return backtrack(sources, current, terminal, false, false);
   }
 
   public static ArrayList<ExprNodeDesc> backtrack(List<ExprNodeDesc> sources,
-      Operator<?> current, Operator<?> terminal, boolean foldExpr) throws SemanticException {
-    ArrayList<ExprNodeDesc> result = new ArrayList<ExprNodeDesc>();
+      Operator<?> current, Operator<?> terminal, boolean foldExpr, boolean skipRSParent) throws SemanticException {

Review comment:
       I think `skipRSParent` is a bit misleading ; you don't want to skip the RS - you want to stay in the same vertex

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java
##########
@@ -1589,13 +1588,17 @@ private void removeSemijoinsParallelToMapJoin(OptimizeTezProcContext procCtx)
 
       List<ExprNodeDesc> keyDesc = selectedMJOp.getConf().getKeys().get(posBigTable);
       ExprNodeColumnDesc keyCol = (ExprNodeColumnDesc) keyDesc.get(0);
-      String realTSColName = OperatorUtils.findTableColNameOf(selectedMJOp, keyCol.getColumn());
-      if (realTSColName != null) {
+      ExprNodeColumnDesc originTSColExpr = OperatorUtils.findTableOriginColExpr(keyCol, selectedMJOp, tsOp);
+      if (originTSColExpr == null) {
+        LOG.warn("ProbeDecode could not find origTSCol for mjCol: {} with MJ Schema: {}",

Review comment:
       current algorithm seems to be:
   * select best mj candidate
   * do some further processing - which may bail out
   
   bailing out for the best candidate doesn't neccessarily mean that we will still bail out for a less charming candidate - I think it might worth to try to restructure the extra compilation into to for loop - or instead of selecting the best candidate the first part could be implemented as a priority logic
   
   just an idea for a followup

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/NonBlockingOpDeDupProc.java
##########
@@ -120,7 +120,7 @@ public Object process(Node nd, Stack<Node> stack, NodeProcessorCtx procCtx,
             String outputColumnName = cSELOutputColumnNames.get(i);
             ExprNodeDesc cSELExprNodeDesc = cSELColList.get(i);
             ExprNodeDesc newPSELExprNodeDesc =
-                ExprNodeDescUtils.backtrack(cSELExprNodeDesc, cSEL, pSEL, true);
+                ExprNodeDescUtils.backtrack(cSELExprNodeDesc, cSEL, pSEL, true, false);

Review comment:
       instead of modifying every callsite - can we have a method with the original signature?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk commented on a change in pull request #1286: HIVE-23882: Compiler should skip MJ keyExpr for probe optimization

Posted by GitBox <gi...@apache.org>.
kgyrtkirk commented on a change in pull request #1286:
URL: https://github.com/apache/hive/pull/1286#discussion_r569549445



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorUtils.java
##########
@@ -654,11 +655,18 @@ public static String findTableColNameOf(Operator<?> start, String internalColNam
         continue;
       }
       // If columnName is the output of a ColumnExpr get the original columnName from the Expr Map
-      if (currentOp.getColumnExprMap() != null && currentOp.getColumnExprMap().containsKey(internalColName)
-              && currentOp.getColumnExprMap().get(internalColName) instanceof ExprNodeColumnDesc) {
-        internalColName = ((ExprNodeColumnDesc) currentOp.getColumnExprMap().get(internalColName)).getColumn();
+      if (currentOp.getColumnExprMap() != null && currentOp.getColumnExprMap().containsKey(internalColName)) {
+        // Only use colInfo that is ExprNodeColumnDesc (could even be a UDF on the key at this point)
+        if (currentOp.getColumnExprMap().get(internalColName) instanceof ExprNodeColumnDesc) {
+          internalColName = ((ExprNodeColumnDesc) currentOp.getColumnExprMap().get(internalColName)).getColumn();
+          keyColInfo = currentOp.getSchema().getColumnInfo(internalColName);

Review comment:
       this doesn't look right to me; these 2 lines are mapping back to the previous ops colname and look it up on the current operator - if this does fix some issue for you ; then I guess there were already some issue with the mapping/schema - I think the original bug should be fixed in this case; because the output of the above loopback might be undefined
   
   I think this fix may cause troubles later...




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pgaref commented on pull request #1286: HIVE-23882: Compiler should skip MJ keyExpr for probe optimization

Posted by GitBox <gi...@apache.org>.
pgaref commented on pull request #1286:
URL: https://github.com/apache/hive/pull/1286#issuecomment-662612790


   @ashutoshc @jcamachor  can you please take a look?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] github-actions[bot] closed pull request #1286: HIVE-23882: Compiler should skip MJ keyExpr for probe optimization

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed pull request #1286:
URL: https://github.com/apache/hive/pull/1286


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk commented on a change in pull request #1286: HIVE-23882: Compiler should skip MJ keyExpr for probe optimization

Posted by GitBox <gi...@apache.org>.
kgyrtkirk commented on a change in pull request #1286:
URL: https://github.com/apache/hive/pull/1286#discussion_r569549445



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorUtils.java
##########
@@ -654,11 +655,18 @@ public static String findTableColNameOf(Operator<?> start, String internalColNam
         continue;
       }
       // If columnName is the output of a ColumnExpr get the original columnName from the Expr Map
-      if (currentOp.getColumnExprMap() != null && currentOp.getColumnExprMap().containsKey(internalColName)
-              && currentOp.getColumnExprMap().get(internalColName) instanceof ExprNodeColumnDesc) {
-        internalColName = ((ExprNodeColumnDesc) currentOp.getColumnExprMap().get(internalColName)).getColumn();
+      if (currentOp.getColumnExprMap() != null && currentOp.getColumnExprMap().containsKey(internalColName)) {
+        // Only use colInfo that is ExprNodeColumnDesc (could even be a UDF on the key at this point)
+        if (currentOp.getColumnExprMap().get(internalColName) instanceof ExprNodeColumnDesc) {
+          internalColName = ((ExprNodeColumnDesc) currentOp.getColumnExprMap().get(internalColName)).getColumn();
+          keyColInfo = currentOp.getSchema().getColumnInfo(internalColName);

Review comment:
       this doesn't look right to me; these 2 lines are mapping back to the previous ops colname and look it up on the current operator - if something like this happens - then the original bug should be fixed.
   
   I think this fix may cause troubles later...




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pgaref commented on a change in pull request #1286: HIVE-23882: Compiler should skip MJ keyExpr for probe optimization

Posted by GitBox <gi...@apache.org>.
pgaref commented on a change in pull request #1286:
URL: https://github.com/apache/hive/pull/1286#discussion_r578636427



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/NonBlockingOpDeDupProc.java
##########
@@ -120,7 +120,7 @@ public Object process(Node nd, Stack<Node> stack, NodeProcessorCtx procCtx,
             String outputColumnName = cSELOutputColumnNames.get(i);
             ExprNodeDesc cSELExprNodeDesc = cSELColList.get(i);
             ExprNodeDesc newPSELExprNodeDesc =
-                ExprNodeDescUtils.backtrack(cSELExprNodeDesc, cSEL, pSEL, true);
+                ExprNodeDescUtils.backtrack(cSELExprNodeDesc, cSEL, pSEL, true, false);

Review comment:
       sure, done

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java
##########
@@ -362,26 +362,26 @@ public static boolean isDeterministic(ExprNodeDesc desc) {
    */
   public static ArrayList<ExprNodeDesc> backtrack(List<ExprNodeDesc> sources,
       Operator<?> current, Operator<?> terminal) throws SemanticException {
-    return backtrack(sources, current, terminal, false);
+    return backtrack(sources, current, terminal, false, false);
   }
 
   public static ArrayList<ExprNodeDesc> backtrack(List<ExprNodeDesc> sources,
-      Operator<?> current, Operator<?> terminal, boolean foldExpr) throws SemanticException {
-    ArrayList<ExprNodeDesc> result = new ArrayList<ExprNodeDesc>();
+      Operator<?> current, Operator<?> terminal, boolean foldExpr, boolean skipRSParent) throws SemanticException {

Review comment:
       sure, renamed




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk commented on a change in pull request #1286: HIVE-23882: Compiler should skip MJ keyExpr for probe optimization

Posted by GitBox <gi...@apache.org>.
kgyrtkirk commented on a change in pull request #1286:
URL: https://github.com/apache/hive/pull/1286#discussion_r569552909



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorUtils.java
##########
@@ -654,11 +655,18 @@ public static String findTableColNameOf(Operator<?> start, String internalColNam
         continue;
       }
       // If columnName is the output of a ColumnExpr get the original columnName from the Expr Map
-      if (currentOp.getColumnExprMap() != null && currentOp.getColumnExprMap().containsKey(internalColName)
-              && currentOp.getColumnExprMap().get(internalColName) instanceof ExprNodeColumnDesc) {
-        internalColName = ((ExprNodeColumnDesc) currentOp.getColumnExprMap().get(internalColName)).getColumn();

Review comment:
       the old code also has similar issues




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pgaref commented on pull request #1286: HIVE-23882: Compiler should skip MJ keyExpr for probe optimization

Posted by GitBox <gi...@apache.org>.
pgaref commented on pull request #1286:
URL: https://github.com/apache/hive/pull/1286#issuecomment-776684576


   Hey @kgyrtkirk can you please take another look here? Tests passed, had to retriger.
   
   Extended the backtracking logic to use the non-RS brach for the probe-MJ logic -- this actually revealed several missed or wrong optimizations (updated q outs).
   Also extended the optimization logic to check if there a type Cast between src and destination (as the types have to match in the probe case) and only use it on the LLAP mode (probedecode's target).
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pgaref commented on a change in pull request #1286: HIVE-23882: Compiler should skip MJ keyExpr for probe optimization

Posted by GitBox <gi...@apache.org>.
pgaref commented on a change in pull request #1286:
URL: https://github.com/apache/hive/pull/1286#discussion_r578636105



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java
##########
@@ -1589,13 +1588,17 @@ private void removeSemijoinsParallelToMapJoin(OptimizeTezProcContext procCtx)
 
       List<ExprNodeDesc> keyDesc = selectedMJOp.getConf().getKeys().get(posBigTable);
       ExprNodeColumnDesc keyCol = (ExprNodeColumnDesc) keyDesc.get(0);
-      String realTSColName = OperatorUtils.findTableColNameOf(selectedMJOp, keyCol.getColumn());
-      if (realTSColName != null) {
+      ExprNodeColumnDesc originTSColExpr = OperatorUtils.findTableOriginColExpr(keyCol, selectedMJOp, tsOp);
+      if (originTSColExpr == null) {
+        LOG.warn("ProbeDecode could not find origTSCol for mjCol: {} with MJ Schema: {}",

Review comment:
       Yeah, always had this in the back of my head. Opened HIVE-24793 to track 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pgaref commented on a change in pull request #1286: HIVE-23882: Compiler should skip MJ keyExpr for probe optimization

Posted by GitBox <gi...@apache.org>.
pgaref commented on a change in pull request #1286:
URL: https://github.com/apache/hive/pull/1286#discussion_r570340192



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorUtils.java
##########
@@ -654,11 +655,18 @@ public static String findTableColNameOf(Operator<?> start, String internalColNam
         continue;
       }
       // If columnName is the output of a ColumnExpr get the original columnName from the Expr Map
-      if (currentOp.getColumnExprMap() != null && currentOp.getColumnExprMap().containsKey(internalColName)
-              && currentOp.getColumnExprMap().get(internalColName) instanceof ExprNodeColumnDesc) {
-        internalColName = ((ExprNodeColumnDesc) currentOp.getColumnExprMap().get(internalColName)).getColumn();
+      if (currentOp.getColumnExprMap() != null && currentOp.getColumnExprMap().containsKey(internalColName)) {
+        // Only use colInfo that is ExprNodeColumnDesc (could even be a UDF on the key at this point)
+        if (currentOp.getColumnExprMap().get(internalColName) instanceof ExprNodeColumnDesc) {
+          internalColName = ((ExprNodeColumnDesc) currentOp.getColumnExprMap().get(internalColName)).getColumn();
+          keyColInfo = currentOp.getSchema().getColumnInfo(internalColName);

Review comment:
       Thanks for the comments @kgyrtkirk ! Updated the PR to make use of backtracking logic -- had to update it a bit to branch-out in a multi-parent scenario (I believe this is the right thing to do in any case, unless you want to bail out early). Let me know what you think




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] github-actions[bot] commented on pull request #1286: HIVE-23882: Compiler should skip MJ keyExpr for probe optimization

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #1286:
URL: https://github.com/apache/hive/pull/1286#issuecomment-695860126


   This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the dev@hive.apache.org list if the patch is in need of reviews.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org