You are viewing a plain text version of this content. The canonical link for it is here.

Posted to gitbox@hive.apache.org by "ngsg (via GitHub)" <gi...@apache.org> on 2023/03/14 09:49:34 UTC

[GitHub] [hive] ngsg opened a new pull request, #4115: HIVE-27138: Extend RSOp to compute filterTag if it has child MapJoinOp

ngsg opened a new pull request, #4115:
URL: https://github.com/apache/hive/pull/4115

<!--
Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, please read our contributor guidelines: https://cwiki.apache.org/confluence/display/Hive/HowToContribute
2. Ensure that you have created an issue on the Hive project JIRA: https://issues.apache.org/jira/projects/HIVE/summary
3. Ensure you have added or run the appropriate tests for your PR:
4. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]HIVE-XXXXX: Your PR title ...'.
5. Be sure to keep the PR description updated to reflect all changes.
6. Please write your PR title to summarize what this PR proposes.
7. If possible, provide a concise example to reproduce the issue for a faster review.

-->

### What changes were proposed in this pull request?

FilterTag will be computed in RS operator and passed to MapJoin operator when Hive uses Tez engine.

### Why are the changes needed?

MapJoin operator expects that rows from small tables have filterTag.
This is true for MapReduce engine as HashTableSinkOperator writes files with filterTag.
However, there is no similar logic for Tez engine and the lack of filterTag makes NPE at runtime.

We found this bug by running mapjoin_filter_on_outerjoin.q using Tez engine.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

I tested this patch by running mapjoin_filter_on_outerjoin.q using Tez engine.
I insert this file into minillap.query.files so that both MR and Tez engine run this qfile test.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org

[GitHub] [hive] sonarcloud[bot] commented on pull request #4115: HIVE-27138: Extend RSOp to compute filterTag if it has child MapJoinOp

Posted by "sonarcloud[bot] (via GitHub)" <gi...@apache.org>.

sonarcloud[bot] commented on PR #4115:
URL: https://github.com/apache/hive/pull/4115#issuecomment-1471411387

   Kudos, SonarCloud Quality Gate passed!&nbsp; &nbsp; [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive&pullRequest=4115)
   
   [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=BUG)  
   [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=VULNERABILITY)  
   [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4115&resolved=false&types=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4115&resolved=false&types=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4115&resolved=false&types=SECURITY_HOTSPOT)  
   [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=CODE_SMELL) [2 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=CODE_SMELL)
   
   [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4115&metric=coverage&view=list) No Coverage information  
   [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4115&metric=duplicated_lines_density&view=list) No Duplication information
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org

[GitHub] [hive] sonarcloud[bot] commented on pull request #4115: HIVE-27138: Extend RSOp to compute filterTag if it has child MapJoinOp

Posted by "sonarcloud[bot] (via GitHub)" <gi...@apache.org>.

sonarcloud[bot] commented on PR #4115:
URL: https://github.com/apache/hive/pull/4115#issuecomment-1467850634

   Kudos, SonarCloud Quality Gate passed!&nbsp; &nbsp; [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive&pullRequest=4115)
   
   [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=BUG)  
   [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=VULNERABILITY)  
   [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4115&resolved=false&types=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4115&resolved=false&types=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4115&resolved=false&types=SECURITY_HOTSPOT)  
   [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=CODE_SMELL) [2 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=CODE_SMELL)
   
   [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4115&metric=coverage&view=list) No Coverage information  
   [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4115&metric=duplicated_lines_density&view=list) No Duplication information
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org

[GitHub] [hive] github-actions[bot] closed pull request #4115: HIVE-27138: Extend RSOp to compute filterTag if it has child MapJoinOp

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.

github-actions[bot] closed pull request #4115: HIVE-27138: Extend RSOp to compute filterTag if it has child MapJoinOp
URL: https://github.com/apache/hive/pull/4115


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org

[GitHub] [hive] sonarcloud[bot] commented on pull request #4115: HIVE-27138: Extend RSOp to compute filterTag if it has child MapJoinOp

Posted by "sonarcloud[bot] (via GitHub)" <gi...@apache.org>.

sonarcloud[bot] commented on PR #4115:
URL: https://github.com/apache/hive/pull/4115#issuecomment-1707851696

   Kudos, SonarCloud Quality Gate passed!&nbsp; &nbsp; [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive&pullRequest=4115)
   
   [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=BUG)  
   [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=VULNERABILITY)  
   [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4115&resolved=false&types=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4115&resolved=false&types=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4115&resolved=false&types=SECURITY_HOTSPOT)  
   [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=CODE_SMELL) [10 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=CODE_SMELL)
   
   [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4115&metric=coverage&view=list) No Coverage information  
   [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4115&metric=duplicated_lines_density&view=list) No Duplication information
   
   ![warning](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/message_warning-16px.png 'warning') The version of Java (11.0.8) you have used to run this analysis is deprecated and we will stop accepting it soon. Please update to at least Java 17.
   Read more [here](https://docs.sonarcloud.io/appendices/scanner-environment/)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org

[GitHub] [hive] ngsg commented on a diff in pull request #4115: HIVE-27138: Extend RSOp to compute filterTag if it has child MapJoinOp

Posted by "ngsg (via GitHub)" <gi...@apache.org>.

ngsg commented on code in PR #4115:
URL: https://github.com/apache/hive/pull/4115#discussion_r1316780221


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ExtendParentReduceSinkOfMapJoin.java:
##########
@@ -0,0 +1,205 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.optimizer;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Stack;
+
+import org.apache.hadoop.hive.ql.exec.ColumnInfo;
+import org.apache.hadoop.hive.ql.exec.MapJoinOperator;
+import org.apache.hadoop.hive.ql.exec.ReduceSinkOperator;
+import org.apache.hadoop.hive.ql.exec.Utilities;
+import org.apache.hadoop.hive.ql.lib.Node;
+import org.apache.hadoop.hive.ql.lib.NodeProcessorCtx;
+import org.apache.hadoop.hive.ql.lib.SemanticNodeProcessor;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDescUtils;
+import org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc;
+import org.apache.hadoop.hive.ql.plan.MapJoinDesc;
+import org.apache.hadoop.hive.ql.plan.PlanUtils;
+import org.apache.hadoop.hive.ql.plan.ReduceSinkDesc;
+import org.apache.hadoop.hive.ql.plan.TableDesc;
+import org.apache.hadoop.hive.ql.udf.UDFToShort;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPMultiply;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPNot;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPPlus;
+import org.apache.hadoop.hive.serde.serdeConstants;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
+
+public class ExtendParentReduceSinkOfMapJoin implements SemanticNodeProcessor {

Review Comment:
   I renamed and added a comment about it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org

[GitHub] [hive] ngsg commented on a diff in pull request #4115: HIVE-27138: Extend RSOp to compute filterTag if it has child MapJoinOp

Posted by "ngsg (via GitHub)" <gi...@apache.org>.

ngsg commented on code in PR #4115:
URL: https://github.com/apache/hive/pull/4115#discussion_r1317132692


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ExtendParentReduceSinkOfMapJoin.java:
##########
@@ -0,0 +1,205 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.optimizer;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Stack;
+
+import org.apache.hadoop.hive.ql.exec.ColumnInfo;
+import org.apache.hadoop.hive.ql.exec.MapJoinOperator;
+import org.apache.hadoop.hive.ql.exec.ReduceSinkOperator;
+import org.apache.hadoop.hive.ql.exec.Utilities;
+import org.apache.hadoop.hive.ql.lib.Node;
+import org.apache.hadoop.hive.ql.lib.NodeProcessorCtx;
+import org.apache.hadoop.hive.ql.lib.SemanticNodeProcessor;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDescUtils;
+import org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc;
+import org.apache.hadoop.hive.ql.plan.MapJoinDesc;
+import org.apache.hadoop.hive.ql.plan.PlanUtils;
+import org.apache.hadoop.hive.ql.plan.ReduceSinkDesc;
+import org.apache.hadoop.hive.ql.plan.TableDesc;
+import org.apache.hadoop.hive.ql.udf.UDFToShort;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPMultiply;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPNot;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPPlus;
+import org.apache.hadoop.hive.serde.serdeConstants;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
+
+public class ExtendParentReduceSinkOfMapJoin implements SemanticNodeProcessor {
+
+  private final TypeInfo shortType = TypeInfoFactory.getPrimitiveTypeInfo(serdeConstants.SMALLINT_TYPE_NAME);
+
+  @Override
+  public Object process(Node nd, Stack<Node> stack, NodeProcessorCtx procCtx, Object... nodeOutputs)
+      throws SemanticException {
+    MapJoinOperator mapJoinOp = (MapJoinOperator) nd;
+    MapJoinDesc mapJoinDesc = mapJoinOp.getConf();
+
+    if (mapJoinDesc.getFilterMap() != null) {
+      int[][] filterMap = mapJoinDesc.getFilterMap();
+
+      // 1. Extend ReduceSinkoperator if it's output is filtered by MapJoinOperator.
+      for (byte pos = 0; pos < filterMap.length; pos++) {
+        if (pos == mapJoinDesc.getPosBigTable() || filterMap[pos] == null) {
+          continue;
+        }
+
+        ExprNodeDesc filterTagExpr =
+            generateFilterTagExpression(filterMap[pos], mapJoinDesc.getFilters().get(pos));
+
+        // Note that the parent RS for the given pos is retrieved in different way in MapJoinProcessor.
+        // TODO: MapJoinProcessor.convertMapJoin() fixes the order of parent operators.
+        //  Does other callers also fix the order as well as MapJoinProcessor.convertMapJoin()?
+        ReduceSinkOperator parent = (ReduceSinkOperator) mapJoinOp.getParentOperators().get(pos);
+        ReduceSinkDesc pRsConf = parent.getConf();
+
+        // MapJoinProcessor.getMapJoinDesc() replaces filter expressions with backtracked one if
+        // adjustParentsChildren is true.
+        // As of now, ConvertJoinMapJoin.convertJoinDynamicPartitionedHashJoin() is the only functions that
+        // calls this method with adjustParentsChildren = false. Therefore, we backtrack filter expressions
+        // only if MapJoinDesc.isDynamicPartitionHashJoin is true, which is also the unique property of
+        // ConvertJoinMapJoin.convertJoinDynamicPartitionedHashJoin().
+        ExprNodeDesc mapSideFilterTagExpr;
+        if (mapJoinDesc.isDynamicPartitionHashJoin()) {
+          mapSideFilterTagExpr = ExprNodeDescUtils.backtrack(filterTagExpr, mapJoinOp, parent);
+        } else {
+          mapSideFilterTagExpr = filterTagExpr;
+        }
+        String filterColumnName = "_filterTag";
+
+        pRsConf.getValueCols().add(mapSideFilterTagExpr);
+        pRsConf.getOutputValueColumnNames().add(filterColumnName);
+        pRsConf.getColumnExprMap()
+            .put(Utilities.ReduceField.VALUE + "." + filterColumnName, mapSideFilterTagExpr);
+
+        ColumnInfo filterTagColumnInfo =
+            new ColumnInfo(Utilities.ReduceField.VALUE + "." + filterColumnName, shortType, "", false);
+        parent.getSchema().getSignature().add(filterTagColumnInfo);
+
+        TableDesc newTableDesc =
+            PlanUtils.getReduceValueTableDesc(
+                PlanUtils.getFieldSchemasFromColumnList(pRsConf.getValueCols(), "_col"));
+        pRsConf.setValueSerializeInfo(newTableDesc);
+      }
+
+      // 2. Update MapJoinOperator's valueFilteredTableDescs.
+      // Unlike HashTableSinkOperator used in MR engine, Tez engine directly passes rows from RS to MapJoin.
+      // Therefore, RS's writer and MapJoin's reader should have the same TableDesc. We create valueTableDesc
+      // here again because it can be different from RS's valueSerializeInfo due to ColumnPruner.
+      List<TableDesc> newMapJoinValueFilteredTableDescs = new ArrayList<>();
+      for (byte pos = 0; pos < mapJoinOp.getParentOperators().size(); pos++) {
+        TableDesc tableDesc;
+
+        if (pos == mapJoinDesc.getPosBigTable() || filterMap[pos] == null) {
+          // We did not change corresponding parent operator. Use the original tableDesc.
+          tableDesc = mapJoinDesc.getValueFilteredTblDescs().get(pos);
+        } else {
+          // Create a new TableDesc based on corresponding parent RSOperator.
+          ReduceSinkOperator parent = (ReduceSinkOperator) mapJoinOp.getParentOperators().get(pos);
+          ReduceSinkDesc pRsConf = parent.getConf();
+
+          tableDesc =
+              PlanUtils.getMapJoinValueTableDesc(
+                  PlanUtils.getFieldSchemasFromColumnList(pRsConf.getValueCols(), "mapjoinvalue"));
+        }
+
+        newMapJoinValueFilteredTableDescs.add(tableDesc);
+      }
+      mapJoinDesc.setValueFilteredTblDescs(newMapJoinValueFilteredTableDescs);
+    }
+
+    return null;
+  }
+
+  private ExprNodeDesc generateFilterTagExpression(int[] filterMap, List<ExprNodeDesc> filterExprs) {
+    ExprNodeDesc filterTagExpr = new ExprNodeConstantDesc(shortType, (short) 0);
+    Map<Byte, ExprNodeDesc> filterExprMap = getFilterExprMap(filterMap, filterExprs);
+
+    for (Map.Entry<Byte, ExprNodeDesc> entry: filterExprMap.entrySet()) {
+      ExprNodeDesc filterTagMaskExpr = generateFilterTagMask(entry.getKey(), entry.getValue());
+
+      if (filterTagExpr instanceof ExprNodeConstantDesc) {
+        filterTagExpr = filterTagMaskExpr;
+      } else {
+        List<ExprNodeDesc> plusArgs = new ArrayList<>(2);
+        plusArgs.add(filterTagMaskExpr);
+        plusArgs.add(filterTagExpr);
+        filterTagExpr = new ExprNodeGenericFuncDesc(shortType, new GenericUDFOPPlus(), plusArgs);
+      }
+    }
+
+    return filterTagExpr;
+  }
+
+  private Map<Byte, ExprNodeDesc> getFilterExprMap(int[] filterInfo, List<ExprNodeDesc> filterExprs) {
+    Map<Byte, ExprNodeDesc> filterExprMap = new HashMap<>();
+
+    int exprListOffset = 0;
+    for (int idx = 0; idx < filterInfo.length; idx = idx + 2) {
+      byte tag = (byte) filterInfo[idx];
+      int length = filterInfo[idx + 1];
+
+      int nextExprOffset = exprListOffset + length;
+      List<ExprNodeDesc> andArgs = filterExprs.subList(exprListOffset, nextExprOffset);
+      exprListOffset = nextExprOffset;
+
+      if (andArgs.size() == 1) {
+        filterExprMap.put(tag, andArgs.get(0));
+      } else if (andArgs.size() > 1) {
+        filterExprMap.put(tag, ExprNodeDescUtils.and(andArgs));
+      }
+    }
+
+    return filterExprMap;
+  }
+
+  private ExprNodeDesc generateFilterTagMask(byte tag, ExprNodeDesc condition) {

Review Comment:
   The meaning of value is correct. filterTag == 0 means that the row satisfies all join conditions and will be forwarded. If filterTag is greater than 0, the row should be filtered out unless there is an outer join.
   
   The meaning of mask is a bit complicated; it indicates the alias of table that associates with the failed filter expression. (alias is also called as tag or pos.)
   
   For example, suppose that we are joining 3 tables A, B, and C using the following SQL command.
   (Assume that alias 0 = table A, alias 1 = table B, and alias 2 = table C)
   ```
   SELECT * FROM A JOIN B ON A.k = B.k and A.k > 10 and A.v = 0 JOIN C ON A.k = C.k AND A.k < 20;
   ```
   There are 3 filter expressions in the above command: `A.k > 10`, `A.v = 0`, and `A.k < 20`. You can see that all of them are evaluated on table A, but they do not belong to the same join relations: `A.k > 10`, `A.v = 0` belong to `A JOIN B` while `A.k < 20` belongs to `A JOIN C`.
   The mask is used to distinguish `A JOIN B` and `A JOIN C` in this situation. If filterTag & 0x04 == 1, i.e. the third lowest bit of filterTag is 1, then one of the filter expressions that associates with alias 2 returns false. In our case, alias 2 means table C. So the filterTag & 0x04 == 1 means that `A.k < 20` returns false.
   
   filterTag is used for properly computing outer joins. You can refer to CommonJoinOperator.genObject(), which uses getFilterTag() and plays a central role in the join algorithm.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org

[GitHub] [hive] soenkeliebau commented on pull request #4115: HIVE-27138: Extend RSOp to compute filterTag if it has child MapJoinOp

Posted by "soenkeliebau (via GitHub)" <gi...@apache.org>.

soenkeliebau commented on PR #4115:
URL: https://github.com/apache/hive/pull/4115#issuecomment-1720766916

   Apologies!
   I have no relation to this PR. I was digging through Hive issues a few days ago and was being a klutz in the search results. I must have accidentally assigned this to me. I'll correct that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org

[GitHub] [hive] ngsg commented on a diff in pull request #4115: HIVE-27138: Extend RSOp to compute filterTag if it has child MapJoinOp

Posted by "ngsg (via GitHub)" <gi...@apache.org>.

ngsg commented on code in PR #4115:
URL: https://github.com/apache/hive/pull/4115#discussion_r1316786419


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ExtendParentReduceSinkOfMapJoin.java:
##########
@@ -0,0 +1,205 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.optimizer;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Stack;
+
+import org.apache.hadoop.hive.ql.exec.ColumnInfo;
+import org.apache.hadoop.hive.ql.exec.MapJoinOperator;
+import org.apache.hadoop.hive.ql.exec.ReduceSinkOperator;
+import org.apache.hadoop.hive.ql.exec.Utilities;
+import org.apache.hadoop.hive.ql.lib.Node;
+import org.apache.hadoop.hive.ql.lib.NodeProcessorCtx;
+import org.apache.hadoop.hive.ql.lib.SemanticNodeProcessor;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDescUtils;
+import org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc;
+import org.apache.hadoop.hive.ql.plan.MapJoinDesc;
+import org.apache.hadoop.hive.ql.plan.PlanUtils;
+import org.apache.hadoop.hive.ql.plan.ReduceSinkDesc;
+import org.apache.hadoop.hive.ql.plan.TableDesc;
+import org.apache.hadoop.hive.ql.udf.UDFToShort;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPMultiply;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPNot;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPPlus;
+import org.apache.hadoop.hive.serde.serdeConstants;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
+
+public class ExtendParentReduceSinkOfMapJoin implements SemanticNodeProcessor {
+
+  private final TypeInfo shortType = TypeInfoFactory.getPrimitiveTypeInfo(serdeConstants.SMALLINT_TYPE_NAME);
+
+  @Override
+  public Object process(Node nd, Stack<Node> stack, NodeProcessorCtx procCtx, Object... nodeOutputs)
+      throws SemanticException {
+    MapJoinOperator mapJoinOp = (MapJoinOperator) nd;
+    MapJoinDesc mapJoinDesc = mapJoinOp.getConf();
+
+    if (mapJoinDesc.getFilterMap() != null) {
+      int[][] filterMap = mapJoinDesc.getFilterMap();
+
+      // 1. Extend ReduceSinkoperator if it's output is filtered by MapJoinOperator.
+      for (byte pos = 0; pos < filterMap.length; pos++) {
+        if (pos == mapJoinDesc.getPosBigTable() || filterMap[pos] == null) {
+          continue;
+        }
+
+        ExprNodeDesc filterTagExpr =
+            generateFilterTagExpression(filterMap[pos], mapJoinDesc.getFilters().get(pos));
+
+        // Note that the parent RS for the given pos is retrieved in different way in MapJoinProcessor.
+        // TODO: MapJoinProcessor.convertMapJoin() fixes the order of parent operators.
+        //  Does other callers also fix the order as well as MapJoinProcessor.convertMapJoin()?
+        ReduceSinkOperator parent = (ReduceSinkOperator) mapJoinOp.getParentOperators().get(pos);
+        ReduceSinkDesc pRsConf = parent.getConf();
+
+        // MapJoinProcessor.getMapJoinDesc() replaces filter expressions with backtracked one if
+        // adjustParentsChildren is true.
+        // As of now, ConvertJoinMapJoin.convertJoinDynamicPartitionedHashJoin() is the only functions that
+        // calls this method with adjustParentsChildren = false. Therefore, we backtrack filter expressions
+        // only if MapJoinDesc.isDynamicPartitionHashJoin is true, which is also the unique property of
+        // ConvertJoinMapJoin.convertJoinDynamicPartitionedHashJoin().
+        ExprNodeDesc mapSideFilterTagExpr;
+        if (mapJoinDesc.isDynamicPartitionHashJoin()) {
+          mapSideFilterTagExpr = ExprNodeDescUtils.backtrack(filterTagExpr, mapJoinOp, parent);
+        } else {
+          mapSideFilterTagExpr = filterTagExpr;
+        }
+        String filterColumnName = "_filterTag";
+
+        pRsConf.getValueCols().add(mapSideFilterTagExpr);
+        pRsConf.getOutputValueColumnNames().add(filterColumnName);
+        pRsConf.getColumnExprMap()
+            .put(Utilities.ReduceField.VALUE + "." + filterColumnName, mapSideFilterTagExpr);
+
+        ColumnInfo filterTagColumnInfo =
+            new ColumnInfo(Utilities.ReduceField.VALUE + "." + filterColumnName, shortType, "", false);
+        parent.getSchema().getSignature().add(filterTagColumnInfo);
+
+        TableDesc newTableDesc =
+            PlanUtils.getReduceValueTableDesc(
+                PlanUtils.getFieldSchemasFromColumnList(pRsConf.getValueCols(), "_col"));
+        pRsConf.setValueSerializeInfo(newTableDesc);
+      }
+
+      // 2. Update MapJoinOperator's valueFilteredTableDescs.
+      // Unlike HashTableSinkOperator used in MR engine, Tez engine directly passes rows from RS to MapJoin.
+      // Therefore, RS's writer and MapJoin's reader should have the same TableDesc. We create valueTableDesc
+      // here again because it can be different from RS's valueSerializeInfo due to ColumnPruner.
+      List<TableDesc> newMapJoinValueFilteredTableDescs = new ArrayList<>();
+      for (byte pos = 0; pos < mapJoinOp.getParentOperators().size(); pos++) {
+        TableDesc tableDesc;
+
+        if (pos == mapJoinDesc.getPosBigTable() || filterMap[pos] == null) {
+          // We did not change corresponding parent operator. Use the original tableDesc.
+          tableDesc = mapJoinDesc.getValueFilteredTblDescs().get(pos);
+        } else {
+          // Create a new TableDesc based on corresponding parent RSOperator.
+          ReduceSinkOperator parent = (ReduceSinkOperator) mapJoinOp.getParentOperators().get(pos);
+          ReduceSinkDesc pRsConf = parent.getConf();
+
+          tableDesc =
+              PlanUtils.getMapJoinValueTableDesc(
+                  PlanUtils.getFieldSchemasFromColumnList(pRsConf.getValueCols(), "mapjoinvalue"));
+        }
+
+        newMapJoinValueFilteredTableDescs.add(tableDesc);
+      }
+      mapJoinDesc.setValueFilteredTblDescs(newMapJoinValueFilteredTableDescs);
+    }
+
+    return null;
+  }
+
+  private ExprNodeDesc generateFilterTagExpression(int[] filterMap, List<ExprNodeDesc> filterExprs) {

Review Comment:
   The resultant ExprNodeDesc should perform JoinUtil.isFiltered() on behalf of HashTableSinkOperator (in MR engine) or MapJoinOperator (in Tez engine, for big table only). I added a comment that specifies the reference method.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org

[GitHub] [hive] ngsg commented on a diff in pull request #4115: HIVE-27138: Extend RSOp to compute filterTag if it has child MapJoinOp

Posted by "ngsg (via GitHub)" <gi...@apache.org>.

ngsg commented on code in PR #4115:
URL: https://github.com/apache/hive/pull/4115#discussion_r1317145596


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/FiltertagAppenderProc.java:
##########
@@ -0,0 +1,231 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.optimizer;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Stack;
+
+import org.apache.hadoop.hive.ql.exec.ColumnInfo;
+import org.apache.hadoop.hive.ql.exec.MapJoinOperator;
+import org.apache.hadoop.hive.ql.exec.ReduceSinkOperator;
+import org.apache.hadoop.hive.ql.exec.Utilities;
+import org.apache.hadoop.hive.ql.lib.Node;
+import org.apache.hadoop.hive.ql.lib.NodeProcessorCtx;
+import org.apache.hadoop.hive.ql.lib.SemanticNodeProcessor;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDescUtils;
+import org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc;
+import org.apache.hadoop.hive.ql.plan.MapJoinDesc;
+import org.apache.hadoop.hive.ql.plan.PlanUtils;
+import org.apache.hadoop.hive.ql.plan.ReduceSinkDesc;
+import org.apache.hadoop.hive.ql.plan.TableDesc;
+import org.apache.hadoop.hive.ql.udf.UDFToShort;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPMultiply;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPNot;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPPlus;
+import org.apache.hadoop.hive.serde.serdeConstants;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
+
+/**
+ * Append a filterTag computation column to ReduceSinkOperators whose child is MapJoinOperator.
+ * The added column expresses JoinUtil#isFiltered in the form of ExprNode.
+ * The added column should be located at the end of row as CommonJoinOperator expects it.
+ *
+ * This processor only affects small tables of MapJoin that run on Tez engine.
+ * For big table, MapJoinOperator#process() calls CommonJoinOperator#getFilteredValue(), which adds filterTag.
+ * For MapReduce engine, HashTableSinkOperator adds filterTag to every row.
+ */
+public class FiltertagAppenderProc implements SemanticNodeProcessor {
+
+  private final TypeInfo shortType = TypeInfoFactory.getPrimitiveTypeInfo(serdeConstants.SMALLINT_TYPE_NAME);
+
+  @Override
+  public Object process(Node nd, Stack<Node> stack, NodeProcessorCtx procCtx, Object... nodeOutputs)
+      throws SemanticException {
+    MapJoinOperator mapJoinOp = (MapJoinOperator) nd;
+    MapJoinDesc mapJoinDesc = mapJoinOp.getConf();
+
+    if (mapJoinDesc.getFilterMap() == null) {
+      return null;
+    }
+
+    int[][] filterMap = mapJoinDesc.getFilterMap();
+
+    // 1. Extend ReduceSinkoperator if it's output is filtered by MapJoinOperator.
+    for (byte pos = 0; pos < filterMap.length; pos++) {
+      if (pos == mapJoinDesc.getPosBigTable() || filterMap[pos] == null) {
+        continue;
+      }
+
+      ExprNodeDesc filterTagExpr =
+          generateFilterTagExpression(filterMap[pos], mapJoinDesc.getFilters().get(pos));
+
+      // Note that the parent RS for the given pos is retrieved in different way in MapJoinProcessor.
+      // TODO: MapJoinProcessor.convertMapJoin() fixes the order of parent operators.
+      //  Does other callers also fix the order as well as MapJoinProcessor.convertMapJoin()?
+      ReduceSinkOperator parent = (ReduceSinkOperator) mapJoinOp.getParentOperators().get(pos);
+      ReduceSinkDesc pRsConf = parent.getConf();
+
+      // MapJoinProcessor.getMapJoinDesc() replaces filter expressions with backtracked one if
+      // adjustParentsChildren is true.
+      // As of now, ConvertJoinMapJoin.convertJoinDynamicPartitionedHashJoin() is the only functions that
+      // calls this method with adjustParentsChildren = false. Therefore, we backtrack filter expressions
+      // only if MapJoinDesc.isDynamicPartitionHashJoin is true, which is also the unique property of
+      // ConvertJoinMapJoin.convertJoinDynamicPartitionedHashJoin().
+      ExprNodeDesc mapSideFilterTagExpr;
+      if (mapJoinDesc.isDynamicPartitionHashJoin()) {
+        mapSideFilterTagExpr = ExprNodeDescUtils.backtrack(filterTagExpr, mapJoinOp, parent);
+      } else {
+        mapSideFilterTagExpr = filterTagExpr;
+      }
+      String filterColumnName = "_filterTag";
+
+      pRsConf.getValueCols().add(mapSideFilterTagExpr);
+      pRsConf.getOutputValueColumnNames().add(filterColumnName);
+      pRsConf.getColumnExprMap()
+          .put(Utilities.ReduceField.VALUE + "." + filterColumnName, mapSideFilterTagExpr);
+
+      ColumnInfo filterTagColumnInfo =
+          new ColumnInfo(Utilities.ReduceField.VALUE + "." + filterColumnName, shortType, "", false);
+      parent.getSchema().getSignature().add(filterTagColumnInfo);
+
+      TableDesc newTableDesc =
+          PlanUtils.getReduceValueTableDesc(
+              PlanUtils.getFieldSchemasFromColumnList(pRsConf.getValueCols(), "_col"));
+      pRsConf.setValueSerializeInfo(newTableDesc);
+    }
+
+    // 2. Update MapJoinOperator's valueFilteredTableDescs.
+    // Unlike HashTableSinkOperator used in MR engine, Tez engine directly passes rows from RS to MapJoin.
+    // Therefore, RS's writer and MapJoin's reader should have the same TableDesc. We create valueTableDesc
+    // here again because it can be different from RS's valueSerializeInfo due to ColumnPruner.
+    List<TableDesc> newMapJoinValueFilteredTableDescs = new ArrayList<>();
+    for (byte pos = 0; pos < mapJoinOp.getParentOperators().size(); pos++) {
+      TableDesc tableDesc;
+
+      if (pos == mapJoinDesc.getPosBigTable() || filterMap[pos] == null) {
+        // We did not change corresponding parent operator. Use the original tableDesc.
+        tableDesc = mapJoinDesc.getValueFilteredTblDescs().get(pos);
+      } else {
+        // Create a new TableDesc based on corresponding parent RSOperator.
+        ReduceSinkOperator parent = (ReduceSinkOperator) mapJoinOp.getParentOperators().get(pos);
+        ReduceSinkDesc pRsConf = parent.getConf();
+
+        tableDesc =
+            PlanUtils.getMapJoinValueTableDesc(
+                PlanUtils.getFieldSchemasFromColumnList(pRsConf.getValueCols(), "mapjoinvalue"));
+      }
+
+      newMapJoinValueFilteredTableDescs.add(tableDesc);
+    }
+    mapJoinDesc.setValueFilteredTblDescs(newMapJoinValueFilteredTableDescs);
+
+    return null;
+  }
+
+  /**
+   * Generate an ExprNodeDesc that expresses the following method:
+   * JoinUtil#isFiltered(Object, List<ExprNodeEvaluator>, List<ObjectInspector>, int[]).
+   */
+  private ExprNodeDesc generateFilterTagExpression(int[] filterMap, List<ExprNodeDesc> filterExprs) {
+    ExprNodeDesc filterTagExpr = new ExprNodeConstantDesc(shortType, (short) 0);
+    Map<Byte, ExprNodeDesc> filterExprMap = getFilterExprMap(filterMap, filterExprs);
+
+    for (Map.Entry<Byte, ExprNodeDesc> entry: filterExprMap.entrySet()) {
+      ExprNodeDesc filterTagMaskExpr = generateFilterTagMask(entry.getKey(), entry.getValue());
+
+      if (filterTagExpr instanceof ExprNodeConstantDesc) {
+        filterTagExpr = filterTagMaskExpr;
+      } else {
+        List<ExprNodeDesc> plusArgs = new ArrayList<>(2);
+        plusArgs.add(filterTagMaskExpr);
+        plusArgs.add(filterTagExpr);
+        filterTagExpr = new ExprNodeGenericFuncDesc(shortType, new GenericUDFOPPlus(), plusArgs);
+      }
+    }
+
+    return filterTagExpr;
+  }
+
+  /**
+   * Group filterExprs by tag and merge each of them into a single boolean ExprNodeDesc using AND operator.
+   * filterInfo is repetition of tag and the length of corresponding filter expressions.
+   * For example, filterInfo = {0, 2, 1, 3} means that the first 2 elements in filterExprs belong to tag 0,
+   * and the remaining 3 elements belong to tag 1.
+   */
+  private Map<Byte, ExprNodeDesc> getFilterExprMap(int[] filterInfo, List<ExprNodeDesc> filterExprs) {
+    Map<Byte, ExprNodeDesc> filterExprMap = new HashMap<>();
+
+    int exprListOffset = 0;
+    for (int idx = 0; idx < filterInfo.length; idx = idx + 2) {
+      byte tag = (byte) filterInfo[idx];
+      int length = filterInfo[idx + 1];
+
+      int nextExprOffset = exprListOffset + length;
+      List<ExprNodeDesc> andArgs = filterExprs.subList(exprListOffset, nextExprOffset);
+      exprListOffset = nextExprOffset;
+
+      if (andArgs.size() == 1) {
+        filterExprMap.put(tag, andArgs.get(0));
+      } else if (andArgs.size() > 1) {
+        filterExprMap.put(tag, ExprNodeDescUtils.and(andArgs));
+      }
+    }
+
+    return filterExprMap;
+  }
+
+  /**
+   * Generate an ExprNodeDesc that expresses thw following code:
+   *   if (!condition) { return (short) (1 << tag) } else { return (short) 0; }.
+   */
+  private ExprNodeDesc generateFilterTagMask(byte tag, ExprNodeDesc condition) {
+    ExprNodeDesc filterMaskValue = new ExprNodeConstantDesc(shortType, (short) (1 << tag));
+
+    List<ExprNodeDesc> negateArg = new ArrayList<>(1);
+    negateArg.add(condition);
+    ExprNodeDesc negate = new ExprNodeGenericFuncDesc(
+        TypeInfoFactory.getPrimitiveTypeInfo(serdeConstants.BOOLEAN_TYPE_NAME),
+        new GenericUDFOPNot(),
+        negateArg);
+
+    GenericUDFBridge toShort = new GenericUDFBridge();
+    toShort.setUdfClassName(UDFToShort.class.getName());
+    toShort.setUdfName(UDFToShort.class.getSimpleName());
+
+    List<ExprNodeDesc> toShortArg = new ArrayList<>(1);
+    toShortArg.add(negate);

Review Comment:
   simplified it



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/FiltertagAppenderProc.java:
##########
@@ -0,0 +1,231 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.optimizer;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Stack;
+
+import org.apache.hadoop.hive.ql.exec.ColumnInfo;
+import org.apache.hadoop.hive.ql.exec.MapJoinOperator;
+import org.apache.hadoop.hive.ql.exec.ReduceSinkOperator;
+import org.apache.hadoop.hive.ql.exec.Utilities;
+import org.apache.hadoop.hive.ql.lib.Node;
+import org.apache.hadoop.hive.ql.lib.NodeProcessorCtx;
+import org.apache.hadoop.hive.ql.lib.SemanticNodeProcessor;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDescUtils;
+import org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc;
+import org.apache.hadoop.hive.ql.plan.MapJoinDesc;
+import org.apache.hadoop.hive.ql.plan.PlanUtils;
+import org.apache.hadoop.hive.ql.plan.ReduceSinkDesc;
+import org.apache.hadoop.hive.ql.plan.TableDesc;
+import org.apache.hadoop.hive.ql.udf.UDFToShort;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPMultiply;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPNot;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPPlus;
+import org.apache.hadoop.hive.serde.serdeConstants;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
+
+/**
+ * Append a filterTag computation column to ReduceSinkOperators whose child is MapJoinOperator.
+ * The added column expresses JoinUtil#isFiltered in the form of ExprNode.
+ * The added column should be located at the end of row as CommonJoinOperator expects it.
+ *
+ * This processor only affects small tables of MapJoin that run on Tez engine.
+ * For big table, MapJoinOperator#process() calls CommonJoinOperator#getFilteredValue(), which adds filterTag.
+ * For MapReduce engine, HashTableSinkOperator adds filterTag to every row.
+ */
+public class FiltertagAppenderProc implements SemanticNodeProcessor {
+
+  private final TypeInfo shortType = TypeInfoFactory.getPrimitiveTypeInfo(serdeConstants.SMALLINT_TYPE_NAME);
+
+  @Override
+  public Object process(Node nd, Stack<Node> stack, NodeProcessorCtx procCtx, Object... nodeOutputs)
+      throws SemanticException {
+    MapJoinOperator mapJoinOp = (MapJoinOperator) nd;
+    MapJoinDesc mapJoinDesc = mapJoinOp.getConf();
+
+    if (mapJoinDesc.getFilterMap() == null) {
+      return null;
+    }
+
+    int[][] filterMap = mapJoinDesc.getFilterMap();
+
+    // 1. Extend ReduceSinkoperator if it's output is filtered by MapJoinOperator.
+    for (byte pos = 0; pos < filterMap.length; pos++) {
+      if (pos == mapJoinDesc.getPosBigTable() || filterMap[pos] == null) {
+        continue;
+      }
+
+      ExprNodeDesc filterTagExpr =
+          generateFilterTagExpression(filterMap[pos], mapJoinDesc.getFilters().get(pos));
+
+      // Note that the parent RS for the given pos is retrieved in different way in MapJoinProcessor.
+      // TODO: MapJoinProcessor.convertMapJoin() fixes the order of parent operators.
+      //  Does other callers also fix the order as well as MapJoinProcessor.convertMapJoin()?
+      ReduceSinkOperator parent = (ReduceSinkOperator) mapJoinOp.getParentOperators().get(pos);
+      ReduceSinkDesc pRsConf = parent.getConf();
+
+      // MapJoinProcessor.getMapJoinDesc() replaces filter expressions with backtracked one if
+      // adjustParentsChildren is true.
+      // As of now, ConvertJoinMapJoin.convertJoinDynamicPartitionedHashJoin() is the only functions that
+      // calls this method with adjustParentsChildren = false. Therefore, we backtrack filter expressions
+      // only if MapJoinDesc.isDynamicPartitionHashJoin is true, which is also the unique property of
+      // ConvertJoinMapJoin.convertJoinDynamicPartitionedHashJoin().
+      ExprNodeDesc mapSideFilterTagExpr;
+      if (mapJoinDesc.isDynamicPartitionHashJoin()) {
+        mapSideFilterTagExpr = ExprNodeDescUtils.backtrack(filterTagExpr, mapJoinOp, parent);
+      } else {
+        mapSideFilterTagExpr = filterTagExpr;
+      }
+      String filterColumnName = "_filterTag";
+
+      pRsConf.getValueCols().add(mapSideFilterTagExpr);
+      pRsConf.getOutputValueColumnNames().add(filterColumnName);
+      pRsConf.getColumnExprMap()
+          .put(Utilities.ReduceField.VALUE + "." + filterColumnName, mapSideFilterTagExpr);
+
+      ColumnInfo filterTagColumnInfo =
+          new ColumnInfo(Utilities.ReduceField.VALUE + "." + filterColumnName, shortType, "", false);
+      parent.getSchema().getSignature().add(filterTagColumnInfo);
+
+      TableDesc newTableDesc =
+          PlanUtils.getReduceValueTableDesc(
+              PlanUtils.getFieldSchemasFromColumnList(pRsConf.getValueCols(), "_col"));
+      pRsConf.setValueSerializeInfo(newTableDesc);
+    }
+
+    // 2. Update MapJoinOperator's valueFilteredTableDescs.
+    // Unlike HashTableSinkOperator used in MR engine, Tez engine directly passes rows from RS to MapJoin.
+    // Therefore, RS's writer and MapJoin's reader should have the same TableDesc. We create valueTableDesc
+    // here again because it can be different from RS's valueSerializeInfo due to ColumnPruner.
+    List<TableDesc> newMapJoinValueFilteredTableDescs = new ArrayList<>();
+    for (byte pos = 0; pos < mapJoinOp.getParentOperators().size(); pos++) {
+      TableDesc tableDesc;
+
+      if (pos == mapJoinDesc.getPosBigTable() || filterMap[pos] == null) {
+        // We did not change corresponding parent operator. Use the original tableDesc.
+        tableDesc = mapJoinDesc.getValueFilteredTblDescs().get(pos);
+      } else {
+        // Create a new TableDesc based on corresponding parent RSOperator.
+        ReduceSinkOperator parent = (ReduceSinkOperator) mapJoinOp.getParentOperators().get(pos);
+        ReduceSinkDesc pRsConf = parent.getConf();
+
+        tableDesc =
+            PlanUtils.getMapJoinValueTableDesc(
+                PlanUtils.getFieldSchemasFromColumnList(pRsConf.getValueCols(), "mapjoinvalue"));
+      }
+
+      newMapJoinValueFilteredTableDescs.add(tableDesc);
+    }
+    mapJoinDesc.setValueFilteredTblDescs(newMapJoinValueFilteredTableDescs);
+
+    return null;
+  }
+
+  /**
+   * Generate an ExprNodeDesc that expresses the following method:
+   * JoinUtil#isFiltered(Object, List<ExprNodeEvaluator>, List<ObjectInspector>, int[]).
+   */
+  private ExprNodeDesc generateFilterTagExpression(int[] filterMap, List<ExprNodeDesc> filterExprs) {
+    ExprNodeDesc filterTagExpr = new ExprNodeConstantDesc(shortType, (short) 0);
+    Map<Byte, ExprNodeDesc> filterExprMap = getFilterExprMap(filterMap, filterExprs);
+
+    for (Map.Entry<Byte, ExprNodeDesc> entry: filterExprMap.entrySet()) {
+      ExprNodeDesc filterTagMaskExpr = generateFilterTagMask(entry.getKey(), entry.getValue());
+
+      if (filterTagExpr instanceof ExprNodeConstantDesc) {
+        filterTagExpr = filterTagMaskExpr;
+      } else {
+        List<ExprNodeDesc> plusArgs = new ArrayList<>(2);
+        plusArgs.add(filterTagMaskExpr);
+        plusArgs.add(filterTagExpr);
+        filterTagExpr = new ExprNodeGenericFuncDesc(shortType, new GenericUDFOPPlus(), plusArgs);
+      }
+    }
+
+    return filterTagExpr;
+  }
+
+  /**
+   * Group filterExprs by tag and merge each of them into a single boolean ExprNodeDesc using AND operator.
+   * filterInfo is repetition of tag and the length of corresponding filter expressions.
+   * For example, filterInfo = {0, 2, 1, 3} means that the first 2 elements in filterExprs belong to tag 0,
+   * and the remaining 3 elements belong to tag 1.
+   */
+  private Map<Byte, ExprNodeDesc> getFilterExprMap(int[] filterInfo, List<ExprNodeDesc> filterExprs) {
+    Map<Byte, ExprNodeDesc> filterExprMap = new HashMap<>();
+
+    int exprListOffset = 0;
+    for (int idx = 0; idx < filterInfo.length; idx = idx + 2) {
+      byte tag = (byte) filterInfo[idx];
+      int length = filterInfo[idx + 1];
+
+      int nextExprOffset = exprListOffset + length;
+      List<ExprNodeDesc> andArgs = filterExprs.subList(exprListOffset, nextExprOffset);
+      exprListOffset = nextExprOffset;
+
+      if (andArgs.size() == 1) {
+        filterExprMap.put(tag, andArgs.get(0));
+      } else if (andArgs.size() > 1) {
+        filterExprMap.put(tag, ExprNodeDescUtils.and(andArgs));
+      }
+    }
+
+    return filterExprMap;
+  }
+
+  /**
+   * Generate an ExprNodeDesc that expresses thw following code:
+   *   if (!condition) { return (short) (1 << tag) } else { return (short) 0; }.
+   */
+  private ExprNodeDesc generateFilterTagMask(byte tag, ExprNodeDesc condition) {
+    ExprNodeDesc filterMaskValue = new ExprNodeConstantDesc(shortType, (short) (1 << tag));
+
+    List<ExprNodeDesc> negateArg = new ArrayList<>(1);
+    negateArg.add(condition);
+    ExprNodeDesc negate = new ExprNodeGenericFuncDesc(
+        TypeInfoFactory.getPrimitiveTypeInfo(serdeConstants.BOOLEAN_TYPE_NAME),
+        new GenericUDFOPNot(),
+        negateArg);
+
+    GenericUDFBridge toShort = new GenericUDFBridge();
+    toShort.setUdfClassName(UDFToShort.class.getName());
+    toShort.setUdfName(UDFToShort.class.getSimpleName());
+
+    List<ExprNodeDesc> toShortArg = new ArrayList<>(1);
+    toShortArg.add(negate);
+    ExprNodeDesc conditionAsShort = new ExprNodeGenericFuncDesc(shortType, toShort, toShortArg);
+
+    List<ExprNodeDesc> multiplyArgs = new ArrayList<>(2);
+    multiplyArgs.add(conditionAsShort);
+    multiplyArgs.add(filterMaskValue);

Review Comment:
   simplified it



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/FiltertagAppenderProc.java:
##########
@@ -0,0 +1,231 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.optimizer;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Stack;
+
+import org.apache.hadoop.hive.ql.exec.ColumnInfo;
+import org.apache.hadoop.hive.ql.exec.MapJoinOperator;
+import org.apache.hadoop.hive.ql.exec.ReduceSinkOperator;
+import org.apache.hadoop.hive.ql.exec.Utilities;
+import org.apache.hadoop.hive.ql.lib.Node;
+import org.apache.hadoop.hive.ql.lib.NodeProcessorCtx;
+import org.apache.hadoop.hive.ql.lib.SemanticNodeProcessor;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDescUtils;
+import org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc;
+import org.apache.hadoop.hive.ql.plan.MapJoinDesc;
+import org.apache.hadoop.hive.ql.plan.PlanUtils;
+import org.apache.hadoop.hive.ql.plan.ReduceSinkDesc;
+import org.apache.hadoop.hive.ql.plan.TableDesc;
+import org.apache.hadoop.hive.ql.udf.UDFToShort;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPMultiply;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPNot;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPPlus;
+import org.apache.hadoop.hive.serde.serdeConstants;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
+
+/**
+ * Append a filterTag computation column to ReduceSinkOperators whose child is MapJoinOperator.
+ * The added column expresses JoinUtil#isFiltered in the form of ExprNode.
+ * The added column should be located at the end of row as CommonJoinOperator expects it.
+ *
+ * This processor only affects small tables of MapJoin that run on Tez engine.
+ * For big table, MapJoinOperator#process() calls CommonJoinOperator#getFilteredValue(), which adds filterTag.
+ * For MapReduce engine, HashTableSinkOperator adds filterTag to every row.
+ */
+public class FiltertagAppenderProc implements SemanticNodeProcessor {
+
+  private final TypeInfo shortType = TypeInfoFactory.getPrimitiveTypeInfo(serdeConstants.SMALLINT_TYPE_NAME);
+
+  @Override
+  public Object process(Node nd, Stack<Node> stack, NodeProcessorCtx procCtx, Object... nodeOutputs)
+      throws SemanticException {
+    MapJoinOperator mapJoinOp = (MapJoinOperator) nd;
+    MapJoinDesc mapJoinDesc = mapJoinOp.getConf();
+
+    if (mapJoinDesc.getFilterMap() == null) {
+      return null;
+    }
+
+    int[][] filterMap = mapJoinDesc.getFilterMap();
+
+    // 1. Extend ReduceSinkoperator if it's output is filtered by MapJoinOperator.
+    for (byte pos = 0; pos < filterMap.length; pos++) {
+      if (pos == mapJoinDesc.getPosBigTable() || filterMap[pos] == null) {
+        continue;
+      }
+
+      ExprNodeDesc filterTagExpr =
+          generateFilterTagExpression(filterMap[pos], mapJoinDesc.getFilters().get(pos));
+
+      // Note that the parent RS for the given pos is retrieved in different way in MapJoinProcessor.
+      // TODO: MapJoinProcessor.convertMapJoin() fixes the order of parent operators.
+      //  Does other callers also fix the order as well as MapJoinProcessor.convertMapJoin()?
+      ReduceSinkOperator parent = (ReduceSinkOperator) mapJoinOp.getParentOperators().get(pos);
+      ReduceSinkDesc pRsConf = parent.getConf();
+
+      // MapJoinProcessor.getMapJoinDesc() replaces filter expressions with backtracked one if
+      // adjustParentsChildren is true.
+      // As of now, ConvertJoinMapJoin.convertJoinDynamicPartitionedHashJoin() is the only functions that
+      // calls this method with adjustParentsChildren = false. Therefore, we backtrack filter expressions
+      // only if MapJoinDesc.isDynamicPartitionHashJoin is true, which is also the unique property of
+      // ConvertJoinMapJoin.convertJoinDynamicPartitionedHashJoin().
+      ExprNodeDesc mapSideFilterTagExpr;
+      if (mapJoinDesc.isDynamicPartitionHashJoin()) {
+        mapSideFilterTagExpr = ExprNodeDescUtils.backtrack(filterTagExpr, mapJoinOp, parent);
+      } else {
+        mapSideFilterTagExpr = filterTagExpr;
+      }
+      String filterColumnName = "_filterTag";
+
+      pRsConf.getValueCols().add(mapSideFilterTagExpr);
+      pRsConf.getOutputValueColumnNames().add(filterColumnName);
+      pRsConf.getColumnExprMap()
+          .put(Utilities.ReduceField.VALUE + "." + filterColumnName, mapSideFilterTagExpr);
+
+      ColumnInfo filterTagColumnInfo =
+          new ColumnInfo(Utilities.ReduceField.VALUE + "." + filterColumnName, shortType, "", false);
+      parent.getSchema().getSignature().add(filterTagColumnInfo);
+
+      TableDesc newTableDesc =
+          PlanUtils.getReduceValueTableDesc(
+              PlanUtils.getFieldSchemasFromColumnList(pRsConf.getValueCols(), "_col"));
+      pRsConf.setValueSerializeInfo(newTableDesc);
+    }
+
+    // 2. Update MapJoinOperator's valueFilteredTableDescs.
+    // Unlike HashTableSinkOperator used in MR engine, Tez engine directly passes rows from RS to MapJoin.
+    // Therefore, RS's writer and MapJoin's reader should have the same TableDesc. We create valueTableDesc
+    // here again because it can be different from RS's valueSerializeInfo due to ColumnPruner.
+    List<TableDesc> newMapJoinValueFilteredTableDescs = new ArrayList<>();
+    for (byte pos = 0; pos < mapJoinOp.getParentOperators().size(); pos++) {
+      TableDesc tableDesc;
+
+      if (pos == mapJoinDesc.getPosBigTable() || filterMap[pos] == null) {
+        // We did not change corresponding parent operator. Use the original tableDesc.
+        tableDesc = mapJoinDesc.getValueFilteredTblDescs().get(pos);
+      } else {
+        // Create a new TableDesc based on corresponding parent RSOperator.
+        ReduceSinkOperator parent = (ReduceSinkOperator) mapJoinOp.getParentOperators().get(pos);
+        ReduceSinkDesc pRsConf = parent.getConf();
+
+        tableDesc =
+            PlanUtils.getMapJoinValueTableDesc(
+                PlanUtils.getFieldSchemasFromColumnList(pRsConf.getValueCols(), "mapjoinvalue"));
+      }
+
+      newMapJoinValueFilteredTableDescs.add(tableDesc);
+    }
+    mapJoinDesc.setValueFilteredTblDescs(newMapJoinValueFilteredTableDescs);
+
+    return null;
+  }
+
+  /**
+   * Generate an ExprNodeDesc that expresses the following method:
+   * JoinUtil#isFiltered(Object, List<ExprNodeEvaluator>, List<ObjectInspector>, int[]).
+   */
+  private ExprNodeDesc generateFilterTagExpression(int[] filterMap, List<ExprNodeDesc> filterExprs) {
+    ExprNodeDesc filterTagExpr = new ExprNodeConstantDesc(shortType, (short) 0);
+    Map<Byte, ExprNodeDesc> filterExprMap = getFilterExprMap(filterMap, filterExprs);
+
+    for (Map.Entry<Byte, ExprNodeDesc> entry: filterExprMap.entrySet()) {
+      ExprNodeDesc filterTagMaskExpr = generateFilterTagMask(entry.getKey(), entry.getValue());
+
+      if (filterTagExpr instanceof ExprNodeConstantDesc) {
+        filterTagExpr = filterTagMaskExpr;
+      } else {
+        List<ExprNodeDesc> plusArgs = new ArrayList<>(2);
+        plusArgs.add(filterTagMaskExpr);
+        plusArgs.add(filterTagExpr);
+        filterTagExpr = new ExprNodeGenericFuncDesc(shortType, new GenericUDFOPPlus(), plusArgs);
+      }
+    }
+
+    return filterTagExpr;
+  }
+
+  /**
+   * Group filterExprs by tag and merge each of them into a single boolean ExprNodeDesc using AND operator.
+   * filterInfo is repetition of tag and the length of corresponding filter expressions.
+   * For example, filterInfo = {0, 2, 1, 3} means that the first 2 elements in filterExprs belong to tag 0,
+   * and the remaining 3 elements belong to tag 1.
+   */
+  private Map<Byte, ExprNodeDesc> getFilterExprMap(int[] filterInfo, List<ExprNodeDesc> filterExprs) {
+    Map<Byte, ExprNodeDesc> filterExprMap = new HashMap<>();
+
+    int exprListOffset = 0;
+    for (int idx = 0; idx < filterInfo.length; idx = idx + 2) {
+      byte tag = (byte) filterInfo[idx];
+      int length = filterInfo[idx + 1];
+
+      int nextExprOffset = exprListOffset + length;
+      List<ExprNodeDesc> andArgs = filterExprs.subList(exprListOffset, nextExprOffset);
+      exprListOffset = nextExprOffset;
+
+      if (andArgs.size() == 1) {
+        filterExprMap.put(tag, andArgs.get(0));
+      } else if (andArgs.size() > 1) {
+        filterExprMap.put(tag, ExprNodeDescUtils.and(andArgs));
+      }
+    }
+
+    return filterExprMap;
+  }
+
+  /**
+   * Generate an ExprNodeDesc that expresses thw following code:
+   *   if (!condition) { return (short) (1 << tag) } else { return (short) 0; }.
+   */
+  private ExprNodeDesc generateFilterTagMask(byte tag, ExprNodeDesc condition) {
+    ExprNodeDesc filterMaskValue = new ExprNodeConstantDesc(shortType, (short) (1 << tag));
+
+    List<ExprNodeDesc> negateArg = new ArrayList<>(1);
+    negateArg.add(condition);

Review Comment:
   simplified it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org

[GitHub] [hive] ngsg commented on a diff in pull request #4115: HIVE-27138: Extend RSOp to compute filterTag if it has child MapJoinOp

Posted by "ngsg (via GitHub)" <gi...@apache.org>.

ngsg commented on code in PR #4115:
URL: https://github.com/apache/hive/pull/4115#discussion_r1317147717


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/FiltertagAppenderProc.java:
##########
@@ -0,0 +1,231 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.optimizer;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Stack;
+
+import org.apache.hadoop.hive.ql.exec.ColumnInfo;
+import org.apache.hadoop.hive.ql.exec.MapJoinOperator;
+import org.apache.hadoop.hive.ql.exec.ReduceSinkOperator;
+import org.apache.hadoop.hive.ql.exec.Utilities;
+import org.apache.hadoop.hive.ql.lib.Node;
+import org.apache.hadoop.hive.ql.lib.NodeProcessorCtx;
+import org.apache.hadoop.hive.ql.lib.SemanticNodeProcessor;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDescUtils;
+import org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc;
+import org.apache.hadoop.hive.ql.plan.MapJoinDesc;
+import org.apache.hadoop.hive.ql.plan.PlanUtils;
+import org.apache.hadoop.hive.ql.plan.ReduceSinkDesc;
+import org.apache.hadoop.hive.ql.plan.TableDesc;
+import org.apache.hadoop.hive.ql.udf.UDFToShort;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPMultiply;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPNot;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPPlus;
+import org.apache.hadoop.hive.serde.serdeConstants;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
+
+/**
+ * Append a filterTag computation column to ReduceSinkOperators whose child is MapJoinOperator.
+ * The added column expresses JoinUtil#isFiltered in the form of ExprNode.
+ * The added column should be located at the end of row as CommonJoinOperator expects it.
+ *
+ * This processor only affects small tables of MapJoin that run on Tez engine.
+ * For big table, MapJoinOperator#process() calls CommonJoinOperator#getFilteredValue(), which adds filterTag.
+ * For MapReduce engine, HashTableSinkOperator adds filterTag to every row.
+ */
+public class FiltertagAppenderProc implements SemanticNodeProcessor {
+
+  private final TypeInfo shortType = TypeInfoFactory.getPrimitiveTypeInfo(serdeConstants.SMALLINT_TYPE_NAME);
+
+  @Override
+  public Object process(Node nd, Stack<Node> stack, NodeProcessorCtx procCtx, Object... nodeOutputs)
+      throws SemanticException {
+    MapJoinOperator mapJoinOp = (MapJoinOperator) nd;
+    MapJoinDesc mapJoinDesc = mapJoinOp.getConf();
+
+    if (mapJoinDesc.getFilterMap() == null) {
+      return null;
+    }
+
+    int[][] filterMap = mapJoinDesc.getFilterMap();
+
+    // 1. Extend ReduceSinkoperator if it's output is filtered by MapJoinOperator.
+    for (byte pos = 0; pos < filterMap.length; pos++) {
+      if (pos == mapJoinDesc.getPosBigTable() || filterMap[pos] == null) {
+        continue;
+      }
+
+      ExprNodeDesc filterTagExpr =
+          generateFilterTagExpression(filterMap[pos], mapJoinDesc.getFilters().get(pos));
+
+      // Note that the parent RS for the given pos is retrieved in different way in MapJoinProcessor.
+      // TODO: MapJoinProcessor.convertMapJoin() fixes the order of parent operators.
+      //  Does other callers also fix the order as well as MapJoinProcessor.convertMapJoin()?
+      ReduceSinkOperator parent = (ReduceSinkOperator) mapJoinOp.getParentOperators().get(pos);
+      ReduceSinkDesc pRsConf = parent.getConf();
+
+      // MapJoinProcessor.getMapJoinDesc() replaces filter expressions with backtracked one if
+      // adjustParentsChildren is true.
+      // As of now, ConvertJoinMapJoin.convertJoinDynamicPartitionedHashJoin() is the only functions that
+      // calls this method with adjustParentsChildren = false. Therefore, we backtrack filter expressions
+      // only if MapJoinDesc.isDynamicPartitionHashJoin is true, which is also the unique property of
+      // ConvertJoinMapJoin.convertJoinDynamicPartitionedHashJoin().
+      ExprNodeDesc mapSideFilterTagExpr;
+      if (mapJoinDesc.isDynamicPartitionHashJoin()) {
+        mapSideFilterTagExpr = ExprNodeDescUtils.backtrack(filterTagExpr, mapJoinOp, parent);
+      } else {
+        mapSideFilterTagExpr = filterTagExpr;
+      }
+      String filterColumnName = "_filterTag";
+
+      pRsConf.getValueCols().add(mapSideFilterTagExpr);
+      pRsConf.getOutputValueColumnNames().add(filterColumnName);
+      pRsConf.getColumnExprMap()
+          .put(Utilities.ReduceField.VALUE + "." + filterColumnName, mapSideFilterTagExpr);
+
+      ColumnInfo filterTagColumnInfo =
+          new ColumnInfo(Utilities.ReduceField.VALUE + "." + filterColumnName, shortType, "", false);
+      parent.getSchema().getSignature().add(filterTagColumnInfo);
+
+      TableDesc newTableDesc =
+          PlanUtils.getReduceValueTableDesc(
+              PlanUtils.getFieldSchemasFromColumnList(pRsConf.getValueCols(), "_col"));
+      pRsConf.setValueSerializeInfo(newTableDesc);
+    }
+
+    // 2. Update MapJoinOperator's valueFilteredTableDescs.
+    // Unlike HashTableSinkOperator used in MR engine, Tez engine directly passes rows from RS to MapJoin.
+    // Therefore, RS's writer and MapJoin's reader should have the same TableDesc. We create valueTableDesc
+    // here again because it can be different from RS's valueSerializeInfo due to ColumnPruner.
+    List<TableDesc> newMapJoinValueFilteredTableDescs = new ArrayList<>();
+    for (byte pos = 0; pos < mapJoinOp.getParentOperators().size(); pos++) {

Review Comment:
   Of course we know the size. I fixed the list creation to use the size.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org

[GitHub] [hive] sonarcloud[bot] commented on pull request #4115: HIVE-27138: Extend RSOp to compute filterTag if it has child MapJoinOp

Posted by "sonarcloud[bot] (via GitHub)" <gi...@apache.org>.

sonarcloud[bot] commented on PR #4115:
URL: https://github.com/apache/hive/pull/4115#issuecomment-1718980321

   Kudos, SonarCloud Quality Gate passed!&nbsp; &nbsp; [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive&pullRequest=4115)
   
   [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=BUG)  
   [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=VULNERABILITY)  
   [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4115&resolved=false&types=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4115&resolved=false&types=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4115&resolved=false&types=SECURITY_HOTSPOT)  
   [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=CODE_SMELL) [10 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=CODE_SMELL)
   
   [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4115&metric=coverage&view=list) No Coverage information  
   [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4115&metric=duplicated_lines_density&view=list) No Duplication information
   
   ![warning](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/message_warning-16px.png 'warning') The version of Java (11.0.8) you have used to run this analysis is deprecated and we will stop accepting it soon. Please update to at least Java 17.
   Read more [here](https://docs.sonarcloud.io/appendices/scanner-environment/)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org

[GitHub] [hive] kasakrisz commented on pull request #4115: HIVE-27138: Extend RSOp to compute filterTag if it has child MapJoinOp

Posted by "kasakrisz (via GitHub)" <gi...@apache.org>.

kasakrisz commented on PR #4115:
URL: https://github.com/apache/hive/pull/4115#issuecomment-1718845237

   @ngsg
   Could you please rebase this patch to master and update the golden file of test mapjoin_filter_on_outerjoin_tez.q


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org

[GitHub] [hive] sonarcloud[bot] commented on pull request #4115: HIVE-27138: Extend RSOp to compute filterTag if it has child MapJoinOp

Posted by "sonarcloud[bot] (via GitHub)" <gi...@apache.org>.

sonarcloud[bot] commented on PR #4115:
URL: https://github.com/apache/hive/pull/4115#issuecomment-1517543333

   Kudos, SonarCloud Quality Gate passed!&nbsp; &nbsp; [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive&pullRequest=4115)
   
   [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=BUG)  
   [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=VULNERABILITY)  
   [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4115&resolved=false&types=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4115&resolved=false&types=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4115&resolved=false&types=SECURITY_HOTSPOT)  
   [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=CODE_SMELL) [11 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=CODE_SMELL)
   
   [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4115&metric=coverage&view=list) No Coverage information  
   [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4115&metric=duplicated_lines_density&view=list) No Duplication information
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org

[GitHub] [hive] kasakrisz commented on pull request #4115: HIVE-27138: Extend RSOp to compute filterTag if it has child MapJoinOp

Posted by "kasakrisz (via GitHub)" <gi...@apache.org>.

kasakrisz commented on PR #4115:
URL: https://github.com/apache/hive/pull/4115#issuecomment-1719303036

   @ngsg 
   I checked your branch out and run the test
   ```
   mvn test -Dtest.output.overwrite -DskipSparkTests -Dtest=TestMiniLlapLocalCliDriver -Dqfile=mapjoin_filter_on_outerjoin_tez.q -pl itests/qtest -Pitests
   ```
   The golden file changed. The changes are the same as the last PTest run shows
   http://ci.hive.apache.org/job/hive-precommit/job/PR-4115/7/testReport/org.apache.hadoop.hive.cli.split13/TestMiniLlapLocalCliDriver/Testing___split_14___PostProcess___testCliDriver_mapjoin_filter_on_outerjoin_tez_/
   ```
   diff --git a/ql/src/test/results/clientpositive/llap/mapjoin_filter_on_outerjoin_tez.q.out b/ql/src/test/results/clientpositive/llap/mapjoin_filter_on_outerjoin_tez.q.out
   index b5874fc236..5080aed095 100644
   --- a/ql/src/test/results/clientpositive/llap/mapjoin_filter_on_outerjoin_tez.q.out
   +++ b/ql/src/test/results/clientpositive/llap/mapjoin_filter_on_outerjoin_tez.q.out
   @@ -687,8 +687,10 @@ NULL       NULL    66      val_66
    NULL   NULL    98      val_98
    PREHOOK: query: DROP TABLE IF EXISTS c
    PREHOOK: type: DROPTABLE
   +PREHOOK: Output: database:default
    POSTHOOK: query: DROP TABLE IF EXISTS c
    POSTHOOK: type: DROPTABLE
   +POSTHOOK: Output: database:default
    PREHOOK: query: CREATE TABLE c (key int, value int)
    PREHOOK: type: CREATETABLE
    PREHOOK: Output: database:default
   @@ -709,8 +711,10 @@ POSTHOOK: Lineage: c.key SCRIPT []
    POSTHOOK: Lineage: c.value SCRIPT []
    PREHOOK: query: DROP TABLE IF EXISTS d
    PREHOOK: type: DROPTABLE
   +PREHOOK: Output: database:default
    POSTHOOK: query: DROP TABLE IF EXISTS d
    POSTHOOK: type: DROPTABLE
   +POSTHOOK: Output: database:default
    PREHOOK: query: CREATE TABLE d (key int, value int)
    PREHOOK: type: CREATETABLE
    PREHOOK: Output: database:default
   ```
   Could you please push these changes to your branch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org

[GitHub] [hive] kasakrisz commented on a diff in pull request #4115: HIVE-27138: Extend RSOp to compute filterTag if it has child MapJoinOp

Posted by "kasakrisz (via GitHub)" <gi...@apache.org>.

kasakrisz commented on code in PR #4115:
URL: https://github.com/apache/hive/pull/4115#discussion_r1315745030


##########
serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryStruct.java:
##########
@@ -221,7 +221,7 @@ public void init(BinaryComparable src) {
       fieldBytes = src.getBytes();
       int length = src.getLength();
       byte nullByte = fieldBytes[0];
-      int lastFieldByteEnd = 1, fieldStart = -1, fieldLength = -1;

Review Comment:
   `fieldStart` and `fieldLength` were local variables and with this change the fields are used instead and this method changes them. Is this intended?
   Can these be remain local variables? If yes please remove the field declarations If no could you please move each field declaration in separate line?
   ```
   private int fieldStart;
   private int fieldLength;
   ```



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ExtendParentReduceSinkOfMapJoin.java:
##########
@@ -0,0 +1,205 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.optimizer;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Stack;
+
+import org.apache.hadoop.hive.ql.exec.ColumnInfo;
+import org.apache.hadoop.hive.ql.exec.MapJoinOperator;
+import org.apache.hadoop.hive.ql.exec.ReduceSinkOperator;
+import org.apache.hadoop.hive.ql.exec.Utilities;
+import org.apache.hadoop.hive.ql.lib.Node;
+import org.apache.hadoop.hive.ql.lib.NodeProcessorCtx;
+import org.apache.hadoop.hive.ql.lib.SemanticNodeProcessor;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDescUtils;
+import org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc;
+import org.apache.hadoop.hive.ql.plan.MapJoinDesc;
+import org.apache.hadoop.hive.ql.plan.PlanUtils;
+import org.apache.hadoop.hive.ql.plan.ReduceSinkDesc;
+import org.apache.hadoop.hive.ql.plan.TableDesc;
+import org.apache.hadoop.hive.ql.udf.UDFToShort;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPMultiply;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPNot;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPPlus;
+import org.apache.hadoop.hive.serde.serdeConstants;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
+
+public class ExtendParentReduceSinkOfMapJoin implements SemanticNodeProcessor {

Review Comment:
   The name `ExtendParentReduceSinkOfMapJoin` sounds like a method name and it is too general. Extending can be anything. Could you please rename this class to something more specific and add javadocs to elaborate the details of its purpose.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ExtendParentReduceSinkOfMapJoin.java:
##########
@@ -0,0 +1,205 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.optimizer;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Stack;
+
+import org.apache.hadoop.hive.ql.exec.ColumnInfo;
+import org.apache.hadoop.hive.ql.exec.MapJoinOperator;
+import org.apache.hadoop.hive.ql.exec.ReduceSinkOperator;
+import org.apache.hadoop.hive.ql.exec.Utilities;
+import org.apache.hadoop.hive.ql.lib.Node;
+import org.apache.hadoop.hive.ql.lib.NodeProcessorCtx;
+import org.apache.hadoop.hive.ql.lib.SemanticNodeProcessor;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDescUtils;
+import org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc;
+import org.apache.hadoop.hive.ql.plan.MapJoinDesc;
+import org.apache.hadoop.hive.ql.plan.PlanUtils;
+import org.apache.hadoop.hive.ql.plan.ReduceSinkDesc;
+import org.apache.hadoop.hive.ql.plan.TableDesc;
+import org.apache.hadoop.hive.ql.udf.UDFToShort;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPMultiply;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPNot;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPPlus;
+import org.apache.hadoop.hive.serde.serdeConstants;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
+
+public class ExtendParentReduceSinkOfMapJoin implements SemanticNodeProcessor {
+
+  private final TypeInfo shortType = TypeInfoFactory.getPrimitiveTypeInfo(serdeConstants.SMALLINT_TYPE_NAME);
+
+  @Override
+  public Object process(Node nd, Stack<Node> stack, NodeProcessorCtx procCtx, Object... nodeOutputs)
+      throws SemanticException {
+    MapJoinOperator mapJoinOp = (MapJoinOperator) nd;
+    MapJoinDesc mapJoinDesc = mapJoinOp.getConf();
+
+    if (mapJoinDesc.getFilterMap() != null) {
+      int[][] filterMap = mapJoinDesc.getFilterMap();
+
+      // 1. Extend ReduceSinkoperator if it's output is filtered by MapJoinOperator.
+      for (byte pos = 0; pos < filterMap.length; pos++) {
+        if (pos == mapJoinDesc.getPosBigTable() || filterMap[pos] == null) {
+          continue;
+        }
+
+        ExprNodeDesc filterTagExpr =
+            generateFilterTagExpression(filterMap[pos], mapJoinDesc.getFilters().get(pos));
+
+        // Note that the parent RS for the given pos is retrieved in different way in MapJoinProcessor.
+        // TODO: MapJoinProcessor.convertMapJoin() fixes the order of parent operators.
+        //  Does other callers also fix the order as well as MapJoinProcessor.convertMapJoin()?
+        ReduceSinkOperator parent = (ReduceSinkOperator) mapJoinOp.getParentOperators().get(pos);
+        ReduceSinkDesc pRsConf = parent.getConf();
+
+        // MapJoinProcessor.getMapJoinDesc() replaces filter expressions with backtracked one if
+        // adjustParentsChildren is true.
+        // As of now, ConvertJoinMapJoin.convertJoinDynamicPartitionedHashJoin() is the only functions that
+        // calls this method with adjustParentsChildren = false. Therefore, we backtrack filter expressions
+        // only if MapJoinDesc.isDynamicPartitionHashJoin is true, which is also the unique property of
+        // ConvertJoinMapJoin.convertJoinDynamicPartitionedHashJoin().
+        ExprNodeDesc mapSideFilterTagExpr;
+        if (mapJoinDesc.isDynamicPartitionHashJoin()) {
+          mapSideFilterTagExpr = ExprNodeDescUtils.backtrack(filterTagExpr, mapJoinOp, parent);
+        } else {
+          mapSideFilterTagExpr = filterTagExpr;
+        }
+        String filterColumnName = "_filterTag";
+
+        pRsConf.getValueCols().add(mapSideFilterTagExpr);
+        pRsConf.getOutputValueColumnNames().add(filterColumnName);
+        pRsConf.getColumnExprMap()
+            .put(Utilities.ReduceField.VALUE + "." + filterColumnName, mapSideFilterTagExpr);
+
+        ColumnInfo filterTagColumnInfo =
+            new ColumnInfo(Utilities.ReduceField.VALUE + "." + filterColumnName, shortType, "", false);
+        parent.getSchema().getSignature().add(filterTagColumnInfo);
+
+        TableDesc newTableDesc =
+            PlanUtils.getReduceValueTableDesc(
+                PlanUtils.getFieldSchemasFromColumnList(pRsConf.getValueCols(), "_col"));
+        pRsConf.setValueSerializeInfo(newTableDesc);
+      }
+
+      // 2. Update MapJoinOperator's valueFilteredTableDescs.
+      // Unlike HashTableSinkOperator used in MR engine, Tez engine directly passes rows from RS to MapJoin.
+      // Therefore, RS's writer and MapJoin's reader should have the same TableDesc. We create valueTableDesc
+      // here again because it can be different from RS's valueSerializeInfo due to ColumnPruner.
+      List<TableDesc> newMapJoinValueFilteredTableDescs = new ArrayList<>();
+      for (byte pos = 0; pos < mapJoinOp.getParentOperators().size(); pos++) {
+        TableDesc tableDesc;
+
+        if (pos == mapJoinDesc.getPosBigTable() || filterMap[pos] == null) {
+          // We did not change corresponding parent operator. Use the original tableDesc.
+          tableDesc = mapJoinDesc.getValueFilteredTblDescs().get(pos);
+        } else {
+          // Create a new TableDesc based on corresponding parent RSOperator.
+          ReduceSinkOperator parent = (ReduceSinkOperator) mapJoinOp.getParentOperators().get(pos);
+          ReduceSinkDesc pRsConf = parent.getConf();
+
+          tableDesc =
+              PlanUtils.getMapJoinValueTableDesc(
+                  PlanUtils.getFieldSchemasFromColumnList(pRsConf.getValueCols(), "mapjoinvalue"));
+        }
+
+        newMapJoinValueFilteredTableDescs.add(tableDesc);
+      }
+      mapJoinDesc.setValueFilteredTblDescs(newMapJoinValueFilteredTableDescs);
+    }
+
+    return null;
+  }
+
+  private ExprNodeDesc generateFilterTagExpression(int[] filterMap, List<ExprNodeDesc> filterExprs) {

Review Comment:
   Could you please add some docs about how this tag expression should look like.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ExtendParentReduceSinkOfMapJoin.java:
##########
@@ -0,0 +1,205 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.optimizer;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Stack;
+
+import org.apache.hadoop.hive.ql.exec.ColumnInfo;
+import org.apache.hadoop.hive.ql.exec.MapJoinOperator;
+import org.apache.hadoop.hive.ql.exec.ReduceSinkOperator;
+import org.apache.hadoop.hive.ql.exec.Utilities;
+import org.apache.hadoop.hive.ql.lib.Node;
+import org.apache.hadoop.hive.ql.lib.NodeProcessorCtx;
+import org.apache.hadoop.hive.ql.lib.SemanticNodeProcessor;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDescUtils;
+import org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc;
+import org.apache.hadoop.hive.ql.plan.MapJoinDesc;
+import org.apache.hadoop.hive.ql.plan.PlanUtils;
+import org.apache.hadoop.hive.ql.plan.ReduceSinkDesc;
+import org.apache.hadoop.hive.ql.plan.TableDesc;
+import org.apache.hadoop.hive.ql.udf.UDFToShort;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPMultiply;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPNot;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPPlus;
+import org.apache.hadoop.hive.serde.serdeConstants;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
+
+public class ExtendParentReduceSinkOfMapJoin implements SemanticNodeProcessor {
+
+  private final TypeInfo shortType = TypeInfoFactory.getPrimitiveTypeInfo(serdeConstants.SMALLINT_TYPE_NAME);
+
+  @Override
+  public Object process(Node nd, Stack<Node> stack, NodeProcessorCtx procCtx, Object... nodeOutputs)
+      throws SemanticException {
+    MapJoinOperator mapJoinOp = (MapJoinOperator) nd;
+    MapJoinDesc mapJoinDesc = mapJoinOp.getConf();
+
+    if (mapJoinDesc.getFilterMap() != null) {

Review Comment:
   nit.: Could you please invert this if condition like
   ```
       int[][] filterMap = mapJoinDesc.getFilterMap();
       if (filterMap == null) {
         return null;
       }
   ```
   since there is a long body but no `else` clause



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ExtendParentReduceSinkOfMapJoin.java:
##########
@@ -0,0 +1,205 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.optimizer;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Stack;
+
+import org.apache.hadoop.hive.ql.exec.ColumnInfo;
+import org.apache.hadoop.hive.ql.exec.MapJoinOperator;
+import org.apache.hadoop.hive.ql.exec.ReduceSinkOperator;
+import org.apache.hadoop.hive.ql.exec.Utilities;
+import org.apache.hadoop.hive.ql.lib.Node;
+import org.apache.hadoop.hive.ql.lib.NodeProcessorCtx;
+import org.apache.hadoop.hive.ql.lib.SemanticNodeProcessor;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDescUtils;
+import org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc;
+import org.apache.hadoop.hive.ql.plan.MapJoinDesc;
+import org.apache.hadoop.hive.ql.plan.PlanUtils;
+import org.apache.hadoop.hive.ql.plan.ReduceSinkDesc;
+import org.apache.hadoop.hive.ql.plan.TableDesc;
+import org.apache.hadoop.hive.ql.udf.UDFToShort;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPMultiply;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPNot;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPPlus;
+import org.apache.hadoop.hive.serde.serdeConstants;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
+
+public class ExtendParentReduceSinkOfMapJoin implements SemanticNodeProcessor {
+
+  private final TypeInfo shortType = TypeInfoFactory.getPrimitiveTypeInfo(serdeConstants.SMALLINT_TYPE_NAME);
+
+  @Override
+  public Object process(Node nd, Stack<Node> stack, NodeProcessorCtx procCtx, Object... nodeOutputs)
+      throws SemanticException {
+    MapJoinOperator mapJoinOp = (MapJoinOperator) nd;
+    MapJoinDesc mapJoinDesc = mapJoinOp.getConf();
+
+    if (mapJoinDesc.getFilterMap() != null) {
+      int[][] filterMap = mapJoinDesc.getFilterMap();
+
+      // 1. Extend ReduceSinkoperator if it's output is filtered by MapJoinOperator.
+      for (byte pos = 0; pos < filterMap.length; pos++) {
+        if (pos == mapJoinDesc.getPosBigTable() || filterMap[pos] == null) {
+          continue;
+        }
+
+        ExprNodeDesc filterTagExpr =
+            generateFilterTagExpression(filterMap[pos], mapJoinDesc.getFilters().get(pos));
+
+        // Note that the parent RS for the given pos is retrieved in different way in MapJoinProcessor.
+        // TODO: MapJoinProcessor.convertMapJoin() fixes the order of parent operators.
+        //  Does other callers also fix the order as well as MapJoinProcessor.convertMapJoin()?
+        ReduceSinkOperator parent = (ReduceSinkOperator) mapJoinOp.getParentOperators().get(pos);
+        ReduceSinkDesc pRsConf = parent.getConf();
+
+        // MapJoinProcessor.getMapJoinDesc() replaces filter expressions with backtracked one if
+        // adjustParentsChildren is true.
+        // As of now, ConvertJoinMapJoin.convertJoinDynamicPartitionedHashJoin() is the only functions that
+        // calls this method with adjustParentsChildren = false. Therefore, we backtrack filter expressions
+        // only if MapJoinDesc.isDynamicPartitionHashJoin is true, which is also the unique property of
+        // ConvertJoinMapJoin.convertJoinDynamicPartitionedHashJoin().
+        ExprNodeDesc mapSideFilterTagExpr;
+        if (mapJoinDesc.isDynamicPartitionHashJoin()) {
+          mapSideFilterTagExpr = ExprNodeDescUtils.backtrack(filterTagExpr, mapJoinOp, parent);
+        } else {
+          mapSideFilterTagExpr = filterTagExpr;
+        }
+        String filterColumnName = "_filterTag";
+
+        pRsConf.getValueCols().add(mapSideFilterTagExpr);
+        pRsConf.getOutputValueColumnNames().add(filterColumnName);
+        pRsConf.getColumnExprMap()
+            .put(Utilities.ReduceField.VALUE + "." + filterColumnName, mapSideFilterTagExpr);
+
+        ColumnInfo filterTagColumnInfo =
+            new ColumnInfo(Utilities.ReduceField.VALUE + "." + filterColumnName, shortType, "", false);
+        parent.getSchema().getSignature().add(filterTagColumnInfo);
+
+        TableDesc newTableDesc =
+            PlanUtils.getReduceValueTableDesc(
+                PlanUtils.getFieldSchemasFromColumnList(pRsConf.getValueCols(), "_col"));
+        pRsConf.setValueSerializeInfo(newTableDesc);
+      }
+
+      // 2. Update MapJoinOperator's valueFilteredTableDescs.
+      // Unlike HashTableSinkOperator used in MR engine, Tez engine directly passes rows from RS to MapJoin.
+      // Therefore, RS's writer and MapJoin's reader should have the same TableDesc. We create valueTableDesc
+      // here again because it can be different from RS's valueSerializeInfo due to ColumnPruner.
+      List<TableDesc> newMapJoinValueFilteredTableDescs = new ArrayList<>();
+      for (byte pos = 0; pos < mapJoinOp.getParentOperators().size(); pos++) {
+        TableDesc tableDesc;
+
+        if (pos == mapJoinDesc.getPosBigTable() || filterMap[pos] == null) {
+          // We did not change corresponding parent operator. Use the original tableDesc.
+          tableDesc = mapJoinDesc.getValueFilteredTblDescs().get(pos);
+        } else {
+          // Create a new TableDesc based on corresponding parent RSOperator.
+          ReduceSinkOperator parent = (ReduceSinkOperator) mapJoinOp.getParentOperators().get(pos);
+          ReduceSinkDesc pRsConf = parent.getConf();
+
+          tableDesc =
+              PlanUtils.getMapJoinValueTableDesc(
+                  PlanUtils.getFieldSchemasFromColumnList(pRsConf.getValueCols(), "mapjoinvalue"));
+        }
+
+        newMapJoinValueFilteredTableDescs.add(tableDesc);
+      }
+      mapJoinDesc.setValueFilteredTblDescs(newMapJoinValueFilteredTableDescs);
+    }
+
+    return null;
+  }
+
+  private ExprNodeDesc generateFilterTagExpression(int[] filterMap, List<ExprNodeDesc> filterExprs) {
+    ExprNodeDesc filterTagExpr = new ExprNodeConstantDesc(shortType, (short) 0);
+    Map<Byte, ExprNodeDesc> filterExprMap = getFilterExprMap(filterMap, filterExprs);
+
+    for (Map.Entry<Byte, ExprNodeDesc> entry: filterExprMap.entrySet()) {
+      ExprNodeDesc filterTagMaskExpr = generateFilterTagMask(entry.getKey(), entry.getValue());
+
+      if (filterTagExpr instanceof ExprNodeConstantDesc) {
+        filterTagExpr = filterTagMaskExpr;
+      } else {
+        List<ExprNodeDesc> plusArgs = new ArrayList<>(2);
+        plusArgs.add(filterTagMaskExpr);
+        plusArgs.add(filterTagExpr);
+        filterTagExpr = new ExprNodeGenericFuncDesc(shortType, new GenericUDFOPPlus(), plusArgs);
+      }
+    }
+
+    return filterTagExpr;
+  }
+
+  private Map<Byte, ExprNodeDesc> getFilterExprMap(int[] filterInfo, List<ExprNodeDesc> filterExprs) {
+    Map<Byte, ExprNodeDesc> filterExprMap = new HashMap<>();
+
+    int exprListOffset = 0;
+    for (int idx = 0; idx < filterInfo.length; idx = idx + 2) {
+      byte tag = (byte) filterInfo[idx];
+      int length = filterInfo[idx + 1];
+
+      int nextExprOffset = exprListOffset + length;
+      List<ExprNodeDesc> andArgs = filterExprs.subList(exprListOffset, nextExprOffset);
+      exprListOffset = nextExprOffset;
+
+      if (andArgs.size() == 1) {
+        filterExprMap.put(tag, andArgs.get(0));
+      } else if (andArgs.size() > 1) {
+        filterExprMap.put(tag, ExprNodeDescUtils.and(andArgs));
+      }
+    }
+
+    return filterExprMap;
+  }
+
+  private ExprNodeDesc generateFilterTagMask(byte tag, ExprNodeDesc condition) {

Review Comment:
   Could you please elaborate the purpose of the tag mask and how it is calculated?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org

[GitHub] [hive] ngsg commented on a diff in pull request #4115: HIVE-27138: Extend RSOp to compute filterTag if it has child MapJoinOp

Posted by "ngsg (via GitHub)" <gi...@apache.org>.

ngsg commented on code in PR #4115:
URL: https://github.com/apache/hive/pull/4115#discussion_r1317147965


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/FiltertagAppenderProc.java:
##########
@@ -0,0 +1,231 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.optimizer;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Stack;
+
+import org.apache.hadoop.hive.ql.exec.ColumnInfo;
+import org.apache.hadoop.hive.ql.exec.MapJoinOperator;
+import org.apache.hadoop.hive.ql.exec.ReduceSinkOperator;
+import org.apache.hadoop.hive.ql.exec.Utilities;
+import org.apache.hadoop.hive.ql.lib.Node;
+import org.apache.hadoop.hive.ql.lib.NodeProcessorCtx;
+import org.apache.hadoop.hive.ql.lib.SemanticNodeProcessor;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDescUtils;
+import org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc;
+import org.apache.hadoop.hive.ql.plan.MapJoinDesc;
+import org.apache.hadoop.hive.ql.plan.PlanUtils;
+import org.apache.hadoop.hive.ql.plan.ReduceSinkDesc;
+import org.apache.hadoop.hive.ql.plan.TableDesc;
+import org.apache.hadoop.hive.ql.udf.UDFToShort;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPMultiply;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPNot;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPPlus;
+import org.apache.hadoop.hive.serde.serdeConstants;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
+
+/**
+ * Append a filterTag computation column to ReduceSinkOperators whose child is MapJoinOperator.
+ * The added column expresses JoinUtil#isFiltered in the form of ExprNode.
+ * The added column should be located at the end of row as CommonJoinOperator expects it.
+ *
+ * This processor only affects small tables of MapJoin that run on Tez engine.
+ * For big table, MapJoinOperator#process() calls CommonJoinOperator#getFilteredValue(), which adds filterTag.
+ * For MapReduce engine, HashTableSinkOperator adds filterTag to every row.
+ */
+public class FiltertagAppenderProc implements SemanticNodeProcessor {
+
+  private final TypeInfo shortType = TypeInfoFactory.getPrimitiveTypeInfo(serdeConstants.SMALLINT_TYPE_NAME);
+
+  @Override
+  public Object process(Node nd, Stack<Node> stack, NodeProcessorCtx procCtx, Object... nodeOutputs)
+      throws SemanticException {
+    MapJoinOperator mapJoinOp = (MapJoinOperator) nd;
+    MapJoinDesc mapJoinDesc = mapJoinOp.getConf();
+
+    if (mapJoinDesc.getFilterMap() == null) {
+      return null;
+    }
+
+    int[][] filterMap = mapJoinDesc.getFilterMap();
+
+    // 1. Extend ReduceSinkoperator if it's output is filtered by MapJoinOperator.
+    for (byte pos = 0; pos < filterMap.length; pos++) {
+      if (pos == mapJoinDesc.getPosBigTable() || filterMap[pos] == null) {
+        continue;
+      }
+
+      ExprNodeDesc filterTagExpr =
+          generateFilterTagExpression(filterMap[pos], mapJoinDesc.getFilters().get(pos));
+
+      // Note that the parent RS for the given pos is retrieved in different way in MapJoinProcessor.
+      // TODO: MapJoinProcessor.convertMapJoin() fixes the order of parent operators.
+      //  Does other callers also fix the order as well as MapJoinProcessor.convertMapJoin()?
+      ReduceSinkOperator parent = (ReduceSinkOperator) mapJoinOp.getParentOperators().get(pos);
+      ReduceSinkDesc pRsConf = parent.getConf();
+
+      // MapJoinProcessor.getMapJoinDesc() replaces filter expressions with backtracked one if
+      // adjustParentsChildren is true.
+      // As of now, ConvertJoinMapJoin.convertJoinDynamicPartitionedHashJoin() is the only functions that
+      // calls this method with adjustParentsChildren = false. Therefore, we backtrack filter expressions
+      // only if MapJoinDesc.isDynamicPartitionHashJoin is true, which is also the unique property of
+      // ConvertJoinMapJoin.convertJoinDynamicPartitionedHashJoin().
+      ExprNodeDesc mapSideFilterTagExpr;
+      if (mapJoinDesc.isDynamicPartitionHashJoin()) {
+        mapSideFilterTagExpr = ExprNodeDescUtils.backtrack(filterTagExpr, mapJoinOp, parent);
+      } else {
+        mapSideFilterTagExpr = filterTagExpr;
+      }
+      String filterColumnName = "_filterTag";
+
+      pRsConf.getValueCols().add(mapSideFilterTagExpr);
+      pRsConf.getOutputValueColumnNames().add(filterColumnName);
+      pRsConf.getColumnExprMap()
+          .put(Utilities.ReduceField.VALUE + "." + filterColumnName, mapSideFilterTagExpr);
+
+      ColumnInfo filterTagColumnInfo =
+          new ColumnInfo(Utilities.ReduceField.VALUE + "." + filterColumnName, shortType, "", false);
+      parent.getSchema().getSignature().add(filterTagColumnInfo);
+
+      TableDesc newTableDesc =
+          PlanUtils.getReduceValueTableDesc(
+              PlanUtils.getFieldSchemasFromColumnList(pRsConf.getValueCols(), "_col"));
+      pRsConf.setValueSerializeInfo(newTableDesc);
+    }
+
+    // 2. Update MapJoinOperator's valueFilteredTableDescs.
+    // Unlike HashTableSinkOperator used in MR engine, Tez engine directly passes rows from RS to MapJoin.
+    // Therefore, RS's writer and MapJoin's reader should have the same TableDesc. We create valueTableDesc
+    // here again because it can be different from RS's valueSerializeInfo due to ColumnPruner.
+    List<TableDesc> newMapJoinValueFilteredTableDescs = new ArrayList<>();
+    for (byte pos = 0; pos < mapJoinOp.getParentOperators().size(); pos++) {
+      TableDesc tableDesc;
+
+      if (pos == mapJoinDesc.getPosBigTable() || filterMap[pos] == null) {
+        // We did not change corresponding parent operator. Use the original tableDesc.
+        tableDesc = mapJoinDesc.getValueFilteredTblDescs().get(pos);
+      } else {
+        // Create a new TableDesc based on corresponding parent RSOperator.
+        ReduceSinkOperator parent = (ReduceSinkOperator) mapJoinOp.getParentOperators().get(pos);
+        ReduceSinkDesc pRsConf = parent.getConf();
+
+        tableDesc =
+            PlanUtils.getMapJoinValueTableDesc(
+                PlanUtils.getFieldSchemasFromColumnList(pRsConf.getValueCols(), "mapjoinvalue"));
+      }
+
+      newMapJoinValueFilteredTableDescs.add(tableDesc);
+    }
+    mapJoinDesc.setValueFilteredTblDescs(newMapJoinValueFilteredTableDescs);
+
+    return null;
+  }
+
+  /**
+   * Generate an ExprNodeDesc that expresses the following method:
+   * JoinUtil#isFiltered(Object, List<ExprNodeEvaluator>, List<ObjectInspector>, int[]).
+   */
+  private ExprNodeDesc generateFilterTagExpression(int[] filterMap, List<ExprNodeDesc> filterExprs) {
+    ExprNodeDesc filterTagExpr = new ExprNodeConstantDesc(shortType, (short) 0);
+    Map<Byte, ExprNodeDesc> filterExprMap = getFilterExprMap(filterMap, filterExprs);
+
+    for (Map.Entry<Byte, ExprNodeDesc> entry: filterExprMap.entrySet()) {
+      ExprNodeDesc filterTagMaskExpr = generateFilterTagMask(entry.getKey(), entry.getValue());
+
+      if (filterTagExpr instanceof ExprNodeConstantDesc) {
+        filterTagExpr = filterTagMaskExpr;
+      } else {
+        List<ExprNodeDesc> plusArgs = new ArrayList<>(2);
+        plusArgs.add(filterTagMaskExpr);
+        plusArgs.add(filterTagExpr);

Review Comment:
   simplified it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org

[GitHub] [hive] github-actions[bot] closed pull request #4115: HIVE-27138: Extend RSOp to compute filterTag if it has child MapJoinOp

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.

github-actions[bot] closed pull request #4115: HIVE-27138: Extend RSOp to compute filterTag if it has child MapJoinOp
URL: https://github.com/apache/hive/pull/4115


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org

[GitHub] [hive] kasakrisz merged pull request #4115: HIVE-27138: Extend RSOp to compute filterTag if it has child MapJoinOp

Posted by "kasakrisz (via GitHub)" <gi...@apache.org>.

kasakrisz merged PR #4115:
URL: https://github.com/apache/hive/pull/4115


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org

[GitHub] [hive] ngsg commented on pull request #4115: HIVE-27138: Extend RSOp to compute filterTag if it has child MapJoinOp

Posted by "ngsg (via GitHub)" <gi...@apache.org>.

ngsg commented on PR #4115:
URL: https://github.com/apache/hive/pull/4115#issuecomment-1720371502

   I rebased and updated the output of `mapjoin_filter_on_outerjoin_tez.q`. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org

[GitHub] [hive] kasakrisz commented on a diff in pull request #4115: HIVE-27138: Extend RSOp to compute filterTag if it has child MapJoinOp

Posted by "kasakrisz (via GitHub)" <gi...@apache.org>.

kasakrisz commented on code in PR #4115:
URL: https://github.com/apache/hive/pull/4115#discussion_r1316978565


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/FiltertagAppenderProc.java:
##########
@@ -0,0 +1,231 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.optimizer;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Stack;
+
+import org.apache.hadoop.hive.ql.exec.ColumnInfo;
+import org.apache.hadoop.hive.ql.exec.MapJoinOperator;
+import org.apache.hadoop.hive.ql.exec.ReduceSinkOperator;
+import org.apache.hadoop.hive.ql.exec.Utilities;
+import org.apache.hadoop.hive.ql.lib.Node;
+import org.apache.hadoop.hive.ql.lib.NodeProcessorCtx;
+import org.apache.hadoop.hive.ql.lib.SemanticNodeProcessor;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDescUtils;
+import org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc;
+import org.apache.hadoop.hive.ql.plan.MapJoinDesc;
+import org.apache.hadoop.hive.ql.plan.PlanUtils;
+import org.apache.hadoop.hive.ql.plan.ReduceSinkDesc;
+import org.apache.hadoop.hive.ql.plan.TableDesc;
+import org.apache.hadoop.hive.ql.udf.UDFToShort;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPMultiply;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPNot;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPPlus;
+import org.apache.hadoop.hive.serde.serdeConstants;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
+
+/**
+ * Append a filterTag computation column to ReduceSinkOperators whose child is MapJoinOperator.
+ * The added column expresses JoinUtil#isFiltered in the form of ExprNode.
+ * The added column should be located at the end of row as CommonJoinOperator expects it.
+ *
+ * This processor only affects small tables of MapJoin that run on Tez engine.
+ * For big table, MapJoinOperator#process() calls CommonJoinOperator#getFilteredValue(), which adds filterTag.
+ * For MapReduce engine, HashTableSinkOperator adds filterTag to every row.
+ */
+public class FiltertagAppenderProc implements SemanticNodeProcessor {
+
+  private final TypeInfo shortType = TypeInfoFactory.getPrimitiveTypeInfo(serdeConstants.SMALLINT_TYPE_NAME);
+
+  @Override
+  public Object process(Node nd, Stack<Node> stack, NodeProcessorCtx procCtx, Object... nodeOutputs)
+      throws SemanticException {
+    MapJoinOperator mapJoinOp = (MapJoinOperator) nd;
+    MapJoinDesc mapJoinDesc = mapJoinOp.getConf();
+
+    if (mapJoinDesc.getFilterMap() == null) {
+      return null;
+    }
+
+    int[][] filterMap = mapJoinDesc.getFilterMap();
+
+    // 1. Extend ReduceSinkoperator if it's output is filtered by MapJoinOperator.
+    for (byte pos = 0; pos < filterMap.length; pos++) {
+      if (pos == mapJoinDesc.getPosBigTable() || filterMap[pos] == null) {
+        continue;
+      }
+
+      ExprNodeDesc filterTagExpr =
+          generateFilterTagExpression(filterMap[pos], mapJoinDesc.getFilters().get(pos));
+
+      // Note that the parent RS for the given pos is retrieved in different way in MapJoinProcessor.
+      // TODO: MapJoinProcessor.convertMapJoin() fixes the order of parent operators.
+      //  Does other callers also fix the order as well as MapJoinProcessor.convertMapJoin()?
+      ReduceSinkOperator parent = (ReduceSinkOperator) mapJoinOp.getParentOperators().get(pos);
+      ReduceSinkDesc pRsConf = parent.getConf();
+
+      // MapJoinProcessor.getMapJoinDesc() replaces filter expressions with backtracked one if
+      // adjustParentsChildren is true.
+      // As of now, ConvertJoinMapJoin.convertJoinDynamicPartitionedHashJoin() is the only functions that
+      // calls this method with adjustParentsChildren = false. Therefore, we backtrack filter expressions
+      // only if MapJoinDesc.isDynamicPartitionHashJoin is true, which is also the unique property of
+      // ConvertJoinMapJoin.convertJoinDynamicPartitionedHashJoin().
+      ExprNodeDesc mapSideFilterTagExpr;
+      if (mapJoinDesc.isDynamicPartitionHashJoin()) {
+        mapSideFilterTagExpr = ExprNodeDescUtils.backtrack(filterTagExpr, mapJoinOp, parent);
+      } else {
+        mapSideFilterTagExpr = filterTagExpr;
+      }
+      String filterColumnName = "_filterTag";
+
+      pRsConf.getValueCols().add(mapSideFilterTagExpr);
+      pRsConf.getOutputValueColumnNames().add(filterColumnName);
+      pRsConf.getColumnExprMap()
+          .put(Utilities.ReduceField.VALUE + "." + filterColumnName, mapSideFilterTagExpr);
+
+      ColumnInfo filterTagColumnInfo =
+          new ColumnInfo(Utilities.ReduceField.VALUE + "." + filterColumnName, shortType, "", false);
+      parent.getSchema().getSignature().add(filterTagColumnInfo);
+
+      TableDesc newTableDesc =
+          PlanUtils.getReduceValueTableDesc(
+              PlanUtils.getFieldSchemasFromColumnList(pRsConf.getValueCols(), "_col"));
+      pRsConf.setValueSerializeInfo(newTableDesc);
+    }
+
+    // 2. Update MapJoinOperator's valueFilteredTableDescs.
+    // Unlike HashTableSinkOperator used in MR engine, Tez engine directly passes rows from RS to MapJoin.
+    // Therefore, RS's writer and MapJoin's reader should have the same TableDesc. We create valueTableDesc
+    // here again because it can be different from RS's valueSerializeInfo due to ColumnPruner.
+    List<TableDesc> newMapJoinValueFilteredTableDescs = new ArrayList<>();
+    for (byte pos = 0; pos < mapJoinOp.getParentOperators().size(); pos++) {
+      TableDesc tableDesc;
+
+      if (pos == mapJoinDesc.getPosBigTable() || filterMap[pos] == null) {
+        // We did not change corresponding parent operator. Use the original tableDesc.
+        tableDesc = mapJoinDesc.getValueFilteredTblDescs().get(pos);
+      } else {
+        // Create a new TableDesc based on corresponding parent RSOperator.
+        ReduceSinkOperator parent = (ReduceSinkOperator) mapJoinOp.getParentOperators().get(pos);
+        ReduceSinkDesc pRsConf = parent.getConf();
+
+        tableDesc =
+            PlanUtils.getMapJoinValueTableDesc(
+                PlanUtils.getFieldSchemasFromColumnList(pRsConf.getValueCols(), "mapjoinvalue"));
+      }
+
+      newMapJoinValueFilteredTableDescs.add(tableDesc);
+    }
+    mapJoinDesc.setValueFilteredTblDescs(newMapJoinValueFilteredTableDescs);
+
+    return null;
+  }
+
+  /**
+   * Generate an ExprNodeDesc that expresses the following method:
+   * JoinUtil#isFiltered(Object, List<ExprNodeEvaluator>, List<ObjectInspector>, int[]).
+   */
+  private ExprNodeDesc generateFilterTagExpression(int[] filterMap, List<ExprNodeDesc> filterExprs) {
+    ExprNodeDesc filterTagExpr = new ExprNodeConstantDesc(shortType, (short) 0);
+    Map<Byte, ExprNodeDesc> filterExprMap = getFilterExprMap(filterMap, filterExprs);
+
+    for (Map.Entry<Byte, ExprNodeDesc> entry: filterExprMap.entrySet()) {
+      ExprNodeDesc filterTagMaskExpr = generateFilterTagMask(entry.getKey(), entry.getValue());
+
+      if (filterTagExpr instanceof ExprNodeConstantDesc) {
+        filterTagExpr = filterTagMaskExpr;
+      } else {
+        List<ExprNodeDesc> plusArgs = new ArrayList<>(2);
+        plusArgs.add(filterTagMaskExpr);
+        plusArgs.add(filterTagExpr);
+        filterTagExpr = new ExprNodeGenericFuncDesc(shortType, new GenericUDFOPPlus(), plusArgs);
+      }
+    }
+
+    return filterTagExpr;
+  }
+
+  /**
+   * Group filterExprs by tag and merge each of them into a single boolean ExprNodeDesc using AND operator.
+   * filterInfo is repetition of tag and the length of corresponding filter expressions.
+   * For example, filterInfo = {0, 2, 1, 3} means that the first 2 elements in filterExprs belong to tag 0,
+   * and the remaining 3 elements belong to tag 1.
+   */
+  private Map<Byte, ExprNodeDesc> getFilterExprMap(int[] filterInfo, List<ExprNodeDesc> filterExprs) {
+    Map<Byte, ExprNodeDesc> filterExprMap = new HashMap<>();
+
+    int exprListOffset = 0;
+    for (int idx = 0; idx < filterInfo.length; idx = idx + 2) {
+      byte tag = (byte) filterInfo[idx];
+      int length = filterInfo[idx + 1];
+
+      int nextExprOffset = exprListOffset + length;
+      List<ExprNodeDesc> andArgs = filterExprs.subList(exprListOffset, nextExprOffset);
+      exprListOffset = nextExprOffset;
+
+      if (andArgs.size() == 1) {
+        filterExprMap.put(tag, andArgs.get(0));
+      } else if (andArgs.size() > 1) {
+        filterExprMap.put(tag, ExprNodeDescUtils.and(andArgs));
+      }
+    }
+
+    return filterExprMap;
+  }
+
+  /**
+   * Generate an ExprNodeDesc that expresses thw following code:
+   *   if (!condition) { return (short) (1 << tag) } else { return (short) 0; }.
+   */
+  private ExprNodeDesc generateFilterTagMask(byte tag, ExprNodeDesc condition) {
+    ExprNodeDesc filterMaskValue = new ExprNodeConstantDesc(shortType, (short) (1 << tag));
+
+    List<ExprNodeDesc> negateArg = new ArrayList<>(1);
+    negateArg.add(condition);
+    ExprNodeDesc negate = new ExprNodeGenericFuncDesc(
+        TypeInfoFactory.getPrimitiveTypeInfo(serdeConstants.BOOLEAN_TYPE_NAME),
+        new GenericUDFOPNot(),
+        negateArg);
+
+    GenericUDFBridge toShort = new GenericUDFBridge();
+    toShort.setUdfClassName(UDFToShort.class.getName());
+    toShort.setUdfName(UDFToShort.class.getSimpleName());
+
+    List<ExprNodeDesc> toShortArg = new ArrayList<>(1);
+    toShortArg.add(negate);

Review Comment:
   nit.: can be simplified
   ```
   Collections.singletonList(negate);
   ```



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/FiltertagAppenderProc.java:
##########
@@ -0,0 +1,231 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.optimizer;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Stack;
+
+import org.apache.hadoop.hive.ql.exec.ColumnInfo;
+import org.apache.hadoop.hive.ql.exec.MapJoinOperator;
+import org.apache.hadoop.hive.ql.exec.ReduceSinkOperator;
+import org.apache.hadoop.hive.ql.exec.Utilities;
+import org.apache.hadoop.hive.ql.lib.Node;
+import org.apache.hadoop.hive.ql.lib.NodeProcessorCtx;
+import org.apache.hadoop.hive.ql.lib.SemanticNodeProcessor;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDescUtils;
+import org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc;
+import org.apache.hadoop.hive.ql.plan.MapJoinDesc;
+import org.apache.hadoop.hive.ql.plan.PlanUtils;
+import org.apache.hadoop.hive.ql.plan.ReduceSinkDesc;
+import org.apache.hadoop.hive.ql.plan.TableDesc;
+import org.apache.hadoop.hive.ql.udf.UDFToShort;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPMultiply;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPNot;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPPlus;
+import org.apache.hadoop.hive.serde.serdeConstants;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
+
+/**
+ * Append a filterTag computation column to ReduceSinkOperators whose child is MapJoinOperator.
+ * The added column expresses JoinUtil#isFiltered in the form of ExprNode.
+ * The added column should be located at the end of row as CommonJoinOperator expects it.
+ *
+ * This processor only affects small tables of MapJoin that run on Tez engine.
+ * For big table, MapJoinOperator#process() calls CommonJoinOperator#getFilteredValue(), which adds filterTag.
+ * For MapReduce engine, HashTableSinkOperator adds filterTag to every row.
+ */
+public class FiltertagAppenderProc implements SemanticNodeProcessor {
+
+  private final TypeInfo shortType = TypeInfoFactory.getPrimitiveTypeInfo(serdeConstants.SMALLINT_TYPE_NAME);
+
+  @Override
+  public Object process(Node nd, Stack<Node> stack, NodeProcessorCtx procCtx, Object... nodeOutputs)
+      throws SemanticException {
+    MapJoinOperator mapJoinOp = (MapJoinOperator) nd;
+    MapJoinDesc mapJoinDesc = mapJoinOp.getConf();
+
+    if (mapJoinDesc.getFilterMap() == null) {
+      return null;
+    }
+
+    int[][] filterMap = mapJoinDesc.getFilterMap();
+
+    // 1. Extend ReduceSinkoperator if it's output is filtered by MapJoinOperator.
+    for (byte pos = 0; pos < filterMap.length; pos++) {
+      if (pos == mapJoinDesc.getPosBigTable() || filterMap[pos] == null) {
+        continue;
+      }
+
+      ExprNodeDesc filterTagExpr =
+          generateFilterTagExpression(filterMap[pos], mapJoinDesc.getFilters().get(pos));
+
+      // Note that the parent RS for the given pos is retrieved in different way in MapJoinProcessor.
+      // TODO: MapJoinProcessor.convertMapJoin() fixes the order of parent operators.
+      //  Does other callers also fix the order as well as MapJoinProcessor.convertMapJoin()?
+      ReduceSinkOperator parent = (ReduceSinkOperator) mapJoinOp.getParentOperators().get(pos);
+      ReduceSinkDesc pRsConf = parent.getConf();
+
+      // MapJoinProcessor.getMapJoinDesc() replaces filter expressions with backtracked one if
+      // adjustParentsChildren is true.
+      // As of now, ConvertJoinMapJoin.convertJoinDynamicPartitionedHashJoin() is the only functions that
+      // calls this method with adjustParentsChildren = false. Therefore, we backtrack filter expressions
+      // only if MapJoinDesc.isDynamicPartitionHashJoin is true, which is also the unique property of
+      // ConvertJoinMapJoin.convertJoinDynamicPartitionedHashJoin().
+      ExprNodeDesc mapSideFilterTagExpr;
+      if (mapJoinDesc.isDynamicPartitionHashJoin()) {
+        mapSideFilterTagExpr = ExprNodeDescUtils.backtrack(filterTagExpr, mapJoinOp, parent);
+      } else {
+        mapSideFilterTagExpr = filterTagExpr;
+      }
+      String filterColumnName = "_filterTag";
+
+      pRsConf.getValueCols().add(mapSideFilterTagExpr);
+      pRsConf.getOutputValueColumnNames().add(filterColumnName);
+      pRsConf.getColumnExprMap()
+          .put(Utilities.ReduceField.VALUE + "." + filterColumnName, mapSideFilterTagExpr);
+
+      ColumnInfo filterTagColumnInfo =
+          new ColumnInfo(Utilities.ReduceField.VALUE + "." + filterColumnName, shortType, "", false);
+      parent.getSchema().getSignature().add(filterTagColumnInfo);
+
+      TableDesc newTableDesc =
+          PlanUtils.getReduceValueTableDesc(
+              PlanUtils.getFieldSchemasFromColumnList(pRsConf.getValueCols(), "_col"));
+      pRsConf.setValueSerializeInfo(newTableDesc);
+    }
+
+    // 2. Update MapJoinOperator's valueFilteredTableDescs.
+    // Unlike HashTableSinkOperator used in MR engine, Tez engine directly passes rows from RS to MapJoin.
+    // Therefore, RS's writer and MapJoin's reader should have the same TableDesc. We create valueTableDesc
+    // here again because it can be different from RS's valueSerializeInfo due to ColumnPruner.
+    List<TableDesc> newMapJoinValueFilteredTableDescs = new ArrayList<>();
+    for (byte pos = 0; pos < mapJoinOp.getParentOperators().size(); pos++) {
+      TableDesc tableDesc;
+
+      if (pos == mapJoinDesc.getPosBigTable() || filterMap[pos] == null) {
+        // We did not change corresponding parent operator. Use the original tableDesc.
+        tableDesc = mapJoinDesc.getValueFilteredTblDescs().get(pos);
+      } else {
+        // Create a new TableDesc based on corresponding parent RSOperator.
+        ReduceSinkOperator parent = (ReduceSinkOperator) mapJoinOp.getParentOperators().get(pos);
+        ReduceSinkDesc pRsConf = parent.getConf();
+
+        tableDesc =
+            PlanUtils.getMapJoinValueTableDesc(
+                PlanUtils.getFieldSchemasFromColumnList(pRsConf.getValueCols(), "mapjoinvalue"));
+      }
+
+      newMapJoinValueFilteredTableDescs.add(tableDesc);
+    }
+    mapJoinDesc.setValueFilteredTblDescs(newMapJoinValueFilteredTableDescs);
+
+    return null;
+  }
+
+  /**
+   * Generate an ExprNodeDesc that expresses the following method:
+   * JoinUtil#isFiltered(Object, List<ExprNodeEvaluator>, List<ObjectInspector>, int[]).
+   */
+  private ExprNodeDesc generateFilterTagExpression(int[] filterMap, List<ExprNodeDesc> filterExprs) {
+    ExprNodeDesc filterTagExpr = new ExprNodeConstantDesc(shortType, (short) 0);
+    Map<Byte, ExprNodeDesc> filterExprMap = getFilterExprMap(filterMap, filterExprs);
+
+    for (Map.Entry<Byte, ExprNodeDesc> entry: filterExprMap.entrySet()) {
+      ExprNodeDesc filterTagMaskExpr = generateFilterTagMask(entry.getKey(), entry.getValue());
+
+      if (filterTagExpr instanceof ExprNodeConstantDesc) {
+        filterTagExpr = filterTagMaskExpr;
+      } else {
+        List<ExprNodeDesc> plusArgs = new ArrayList<>(2);
+        plusArgs.add(filterTagMaskExpr);
+        plusArgs.add(filterTagExpr);
+        filterTagExpr = new ExprNodeGenericFuncDesc(shortType, new GenericUDFOPPlus(), plusArgs);
+      }
+    }
+
+    return filterTagExpr;
+  }
+
+  /**
+   * Group filterExprs by tag and merge each of them into a single boolean ExprNodeDesc using AND operator.
+   * filterInfo is repetition of tag and the length of corresponding filter expressions.
+   * For example, filterInfo = {0, 2, 1, 3} means that the first 2 elements in filterExprs belong to tag 0,
+   * and the remaining 3 elements belong to tag 1.
+   */
+  private Map<Byte, ExprNodeDesc> getFilterExprMap(int[] filterInfo, List<ExprNodeDesc> filterExprs) {
+    Map<Byte, ExprNodeDesc> filterExprMap = new HashMap<>();
+
+    int exprListOffset = 0;
+    for (int idx = 0; idx < filterInfo.length; idx = idx + 2) {
+      byte tag = (byte) filterInfo[idx];
+      int length = filterInfo[idx + 1];
+
+      int nextExprOffset = exprListOffset + length;
+      List<ExprNodeDesc> andArgs = filterExprs.subList(exprListOffset, nextExprOffset);
+      exprListOffset = nextExprOffset;
+
+      if (andArgs.size() == 1) {
+        filterExprMap.put(tag, andArgs.get(0));
+      } else if (andArgs.size() > 1) {
+        filterExprMap.put(tag, ExprNodeDescUtils.and(andArgs));
+      }
+    }
+
+    return filterExprMap;
+  }
+
+  /**
+   * Generate an ExprNodeDesc that expresses thw following code:
+   *   if (!condition) { return (short) (1 << tag) } else { return (short) 0; }.
+   */
+  private ExprNodeDesc generateFilterTagMask(byte tag, ExprNodeDesc condition) {
+    ExprNodeDesc filterMaskValue = new ExprNodeConstantDesc(shortType, (short) (1 << tag));
+
+    List<ExprNodeDesc> negateArg = new ArrayList<>(1);
+    negateArg.add(condition);
+    ExprNodeDesc negate = new ExprNodeGenericFuncDesc(
+        TypeInfoFactory.getPrimitiveTypeInfo(serdeConstants.BOOLEAN_TYPE_NAME),
+        new GenericUDFOPNot(),
+        negateArg);
+
+    GenericUDFBridge toShort = new GenericUDFBridge();
+    toShort.setUdfClassName(UDFToShort.class.getName());
+    toShort.setUdfName(UDFToShort.class.getSimpleName());
+
+    List<ExprNodeDesc> toShortArg = new ArrayList<>(1);
+    toShortArg.add(negate);
+    ExprNodeDesc conditionAsShort = new ExprNodeGenericFuncDesc(shortType, toShort, toShortArg);
+
+    List<ExprNodeDesc> multiplyArgs = new ArrayList<>(2);
+    multiplyArgs.add(conditionAsShort);
+    multiplyArgs.add(filterMaskValue);

Review Comment:
   nit.: can be simplified
   ```
   List<ExprNodeDesc> multiplyArgs = Arrays.asList(conditionAsShort, filterMaskValue);
   ```



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/FiltertagAppenderProc.java:
##########
@@ -0,0 +1,231 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.optimizer;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Stack;
+
+import org.apache.hadoop.hive.ql.exec.ColumnInfo;
+import org.apache.hadoop.hive.ql.exec.MapJoinOperator;
+import org.apache.hadoop.hive.ql.exec.ReduceSinkOperator;
+import org.apache.hadoop.hive.ql.exec.Utilities;
+import org.apache.hadoop.hive.ql.lib.Node;
+import org.apache.hadoop.hive.ql.lib.NodeProcessorCtx;
+import org.apache.hadoop.hive.ql.lib.SemanticNodeProcessor;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDescUtils;
+import org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc;
+import org.apache.hadoop.hive.ql.plan.MapJoinDesc;
+import org.apache.hadoop.hive.ql.plan.PlanUtils;
+import org.apache.hadoop.hive.ql.plan.ReduceSinkDesc;
+import org.apache.hadoop.hive.ql.plan.TableDesc;
+import org.apache.hadoop.hive.ql.udf.UDFToShort;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPMultiply;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPNot;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPPlus;
+import org.apache.hadoop.hive.serde.serdeConstants;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
+
+/**
+ * Append a filterTag computation column to ReduceSinkOperators whose child is MapJoinOperator.
+ * The added column expresses JoinUtil#isFiltered in the form of ExprNode.
+ * The added column should be located at the end of row as CommonJoinOperator expects it.
+ *
+ * This processor only affects small tables of MapJoin that run on Tez engine.
+ * For big table, MapJoinOperator#process() calls CommonJoinOperator#getFilteredValue(), which adds filterTag.
+ * For MapReduce engine, HashTableSinkOperator adds filterTag to every row.
+ */
+public class FiltertagAppenderProc implements SemanticNodeProcessor {
+
+  private final TypeInfo shortType = TypeInfoFactory.getPrimitiveTypeInfo(serdeConstants.SMALLINT_TYPE_NAME);
+
+  @Override
+  public Object process(Node nd, Stack<Node> stack, NodeProcessorCtx procCtx, Object... nodeOutputs)
+      throws SemanticException {
+    MapJoinOperator mapJoinOp = (MapJoinOperator) nd;
+    MapJoinDesc mapJoinDesc = mapJoinOp.getConf();
+
+    if (mapJoinDesc.getFilterMap() == null) {
+      return null;
+    }
+
+    int[][] filterMap = mapJoinDesc.getFilterMap();
+
+    // 1. Extend ReduceSinkoperator if it's output is filtered by MapJoinOperator.
+    for (byte pos = 0; pos < filterMap.length; pos++) {
+      if (pos == mapJoinDesc.getPosBigTable() || filterMap[pos] == null) {
+        continue;
+      }
+
+      ExprNodeDesc filterTagExpr =
+          generateFilterTagExpression(filterMap[pos], mapJoinDesc.getFilters().get(pos));
+
+      // Note that the parent RS for the given pos is retrieved in different way in MapJoinProcessor.
+      // TODO: MapJoinProcessor.convertMapJoin() fixes the order of parent operators.
+      //  Does other callers also fix the order as well as MapJoinProcessor.convertMapJoin()?
+      ReduceSinkOperator parent = (ReduceSinkOperator) mapJoinOp.getParentOperators().get(pos);
+      ReduceSinkDesc pRsConf = parent.getConf();
+
+      // MapJoinProcessor.getMapJoinDesc() replaces filter expressions with backtracked one if
+      // adjustParentsChildren is true.
+      // As of now, ConvertJoinMapJoin.convertJoinDynamicPartitionedHashJoin() is the only functions that
+      // calls this method with adjustParentsChildren = false. Therefore, we backtrack filter expressions
+      // only if MapJoinDesc.isDynamicPartitionHashJoin is true, which is also the unique property of
+      // ConvertJoinMapJoin.convertJoinDynamicPartitionedHashJoin().
+      ExprNodeDesc mapSideFilterTagExpr;
+      if (mapJoinDesc.isDynamicPartitionHashJoin()) {
+        mapSideFilterTagExpr = ExprNodeDescUtils.backtrack(filterTagExpr, mapJoinOp, parent);
+      } else {
+        mapSideFilterTagExpr = filterTagExpr;
+      }
+      String filterColumnName = "_filterTag";
+
+      pRsConf.getValueCols().add(mapSideFilterTagExpr);
+      pRsConf.getOutputValueColumnNames().add(filterColumnName);
+      pRsConf.getColumnExprMap()
+          .put(Utilities.ReduceField.VALUE + "." + filterColumnName, mapSideFilterTagExpr);
+
+      ColumnInfo filterTagColumnInfo =
+          new ColumnInfo(Utilities.ReduceField.VALUE + "." + filterColumnName, shortType, "", false);
+      parent.getSchema().getSignature().add(filterTagColumnInfo);
+
+      TableDesc newTableDesc =
+          PlanUtils.getReduceValueTableDesc(
+              PlanUtils.getFieldSchemasFromColumnList(pRsConf.getValueCols(), "_col"));
+      pRsConf.setValueSerializeInfo(newTableDesc);
+    }
+
+    // 2. Update MapJoinOperator's valueFilteredTableDescs.
+    // Unlike HashTableSinkOperator used in MR engine, Tez engine directly passes rows from RS to MapJoin.
+    // Therefore, RS's writer and MapJoin's reader should have the same TableDesc. We create valueTableDesc
+    // here again because it can be different from RS's valueSerializeInfo due to ColumnPruner.
+    List<TableDesc> newMapJoinValueFilteredTableDescs = new ArrayList<>();
+    for (byte pos = 0; pos < mapJoinOp.getParentOperators().size(); pos++) {
+      TableDesc tableDesc;
+
+      if (pos == mapJoinDesc.getPosBigTable() || filterMap[pos] == null) {
+        // We did not change corresponding parent operator. Use the original tableDesc.
+        tableDesc = mapJoinDesc.getValueFilteredTblDescs().get(pos);
+      } else {
+        // Create a new TableDesc based on corresponding parent RSOperator.
+        ReduceSinkOperator parent = (ReduceSinkOperator) mapJoinOp.getParentOperators().get(pos);
+        ReduceSinkDesc pRsConf = parent.getConf();
+
+        tableDesc =
+            PlanUtils.getMapJoinValueTableDesc(
+                PlanUtils.getFieldSchemasFromColumnList(pRsConf.getValueCols(), "mapjoinvalue"));
+      }
+
+      newMapJoinValueFilteredTableDescs.add(tableDesc);
+    }
+    mapJoinDesc.setValueFilteredTblDescs(newMapJoinValueFilteredTableDescs);
+
+    return null;
+  }
+
+  /**
+   * Generate an ExprNodeDesc that expresses the following method:
+   * JoinUtil#isFiltered(Object, List<ExprNodeEvaluator>, List<ObjectInspector>, int[]).
+   */
+  private ExprNodeDesc generateFilterTagExpression(int[] filterMap, List<ExprNodeDesc> filterExprs) {
+    ExprNodeDesc filterTagExpr = new ExprNodeConstantDesc(shortType, (short) 0);
+    Map<Byte, ExprNodeDesc> filterExprMap = getFilterExprMap(filterMap, filterExprs);
+
+    for (Map.Entry<Byte, ExprNodeDesc> entry: filterExprMap.entrySet()) {
+      ExprNodeDesc filterTagMaskExpr = generateFilterTagMask(entry.getKey(), entry.getValue());
+
+      if (filterTagExpr instanceof ExprNodeConstantDesc) {
+        filterTagExpr = filterTagMaskExpr;
+      } else {
+        List<ExprNodeDesc> plusArgs = new ArrayList<>(2);
+        plusArgs.add(filterTagMaskExpr);
+        plusArgs.add(filterTagExpr);

Review Comment:
   nit.: can be simplified
   ```
   List<ExprNodeDesc> plusArgs = Arrays.asList(filterTagMaskExpr, filterTagExpr);
   ```



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/FiltertagAppenderProc.java:
##########
@@ -0,0 +1,231 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.optimizer;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Stack;
+
+import org.apache.hadoop.hive.ql.exec.ColumnInfo;
+import org.apache.hadoop.hive.ql.exec.MapJoinOperator;
+import org.apache.hadoop.hive.ql.exec.ReduceSinkOperator;
+import org.apache.hadoop.hive.ql.exec.Utilities;
+import org.apache.hadoop.hive.ql.lib.Node;
+import org.apache.hadoop.hive.ql.lib.NodeProcessorCtx;
+import org.apache.hadoop.hive.ql.lib.SemanticNodeProcessor;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDescUtils;
+import org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc;
+import org.apache.hadoop.hive.ql.plan.MapJoinDesc;
+import org.apache.hadoop.hive.ql.plan.PlanUtils;
+import org.apache.hadoop.hive.ql.plan.ReduceSinkDesc;
+import org.apache.hadoop.hive.ql.plan.TableDesc;
+import org.apache.hadoop.hive.ql.udf.UDFToShort;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPMultiply;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPNot;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPPlus;
+import org.apache.hadoop.hive.serde.serdeConstants;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
+
+/**
+ * Append a filterTag computation column to ReduceSinkOperators whose child is MapJoinOperator.
+ * The added column expresses JoinUtil#isFiltered in the form of ExprNode.
+ * The added column should be located at the end of row as CommonJoinOperator expects it.
+ *
+ * This processor only affects small tables of MapJoin that run on Tez engine.
+ * For big table, MapJoinOperator#process() calls CommonJoinOperator#getFilteredValue(), which adds filterTag.
+ * For MapReduce engine, HashTableSinkOperator adds filterTag to every row.
+ */
+public class FiltertagAppenderProc implements SemanticNodeProcessor {
+
+  private final TypeInfo shortType = TypeInfoFactory.getPrimitiveTypeInfo(serdeConstants.SMALLINT_TYPE_NAME);
+
+  @Override
+  public Object process(Node nd, Stack<Node> stack, NodeProcessorCtx procCtx, Object... nodeOutputs)
+      throws SemanticException {
+    MapJoinOperator mapJoinOp = (MapJoinOperator) nd;
+    MapJoinDesc mapJoinDesc = mapJoinOp.getConf();
+
+    if (mapJoinDesc.getFilterMap() == null) {
+      return null;
+    }
+
+    int[][] filterMap = mapJoinDesc.getFilterMap();
+
+    // 1. Extend ReduceSinkoperator if it's output is filtered by MapJoinOperator.
+    for (byte pos = 0; pos < filterMap.length; pos++) {
+      if (pos == mapJoinDesc.getPosBigTable() || filterMap[pos] == null) {
+        continue;
+      }
+
+      ExprNodeDesc filterTagExpr =
+          generateFilterTagExpression(filterMap[pos], mapJoinDesc.getFilters().get(pos));
+
+      // Note that the parent RS for the given pos is retrieved in different way in MapJoinProcessor.
+      // TODO: MapJoinProcessor.convertMapJoin() fixes the order of parent operators.
+      //  Does other callers also fix the order as well as MapJoinProcessor.convertMapJoin()?
+      ReduceSinkOperator parent = (ReduceSinkOperator) mapJoinOp.getParentOperators().get(pos);
+      ReduceSinkDesc pRsConf = parent.getConf();
+
+      // MapJoinProcessor.getMapJoinDesc() replaces filter expressions with backtracked one if
+      // adjustParentsChildren is true.
+      // As of now, ConvertJoinMapJoin.convertJoinDynamicPartitionedHashJoin() is the only functions that
+      // calls this method with adjustParentsChildren = false. Therefore, we backtrack filter expressions
+      // only if MapJoinDesc.isDynamicPartitionHashJoin is true, which is also the unique property of
+      // ConvertJoinMapJoin.convertJoinDynamicPartitionedHashJoin().
+      ExprNodeDesc mapSideFilterTagExpr;
+      if (mapJoinDesc.isDynamicPartitionHashJoin()) {
+        mapSideFilterTagExpr = ExprNodeDescUtils.backtrack(filterTagExpr, mapJoinOp, parent);
+      } else {
+        mapSideFilterTagExpr = filterTagExpr;
+      }
+      String filterColumnName = "_filterTag";
+
+      pRsConf.getValueCols().add(mapSideFilterTagExpr);
+      pRsConf.getOutputValueColumnNames().add(filterColumnName);
+      pRsConf.getColumnExprMap()
+          .put(Utilities.ReduceField.VALUE + "." + filterColumnName, mapSideFilterTagExpr);
+
+      ColumnInfo filterTagColumnInfo =
+          new ColumnInfo(Utilities.ReduceField.VALUE + "." + filterColumnName, shortType, "", false);
+      parent.getSchema().getSignature().add(filterTagColumnInfo);
+
+      TableDesc newTableDesc =
+          PlanUtils.getReduceValueTableDesc(
+              PlanUtils.getFieldSchemasFromColumnList(pRsConf.getValueCols(), "_col"));
+      pRsConf.setValueSerializeInfo(newTableDesc);
+    }
+
+    // 2. Update MapJoinOperator's valueFilteredTableDescs.
+    // Unlike HashTableSinkOperator used in MR engine, Tez engine directly passes rows from RS to MapJoin.
+    // Therefore, RS's writer and MapJoin's reader should have the same TableDesc. We create valueTableDesc
+    // here again because it can be different from RS's valueSerializeInfo due to ColumnPruner.
+    List<TableDesc> newMapJoinValueFilteredTableDescs = new ArrayList<>();
+    for (byte pos = 0; pos < mapJoinOp.getParentOperators().size(); pos++) {
+      TableDesc tableDesc;
+
+      if (pos == mapJoinDesc.getPosBigTable() || filterMap[pos] == null) {
+        // We did not change corresponding parent operator. Use the original tableDesc.
+        tableDesc = mapJoinDesc.getValueFilteredTblDescs().get(pos);
+      } else {
+        // Create a new TableDesc based on corresponding parent RSOperator.
+        ReduceSinkOperator parent = (ReduceSinkOperator) mapJoinOp.getParentOperators().get(pos);
+        ReduceSinkDesc pRsConf = parent.getConf();
+
+        tableDesc =
+            PlanUtils.getMapJoinValueTableDesc(
+                PlanUtils.getFieldSchemasFromColumnList(pRsConf.getValueCols(), "mapjoinvalue"));
+      }
+
+      newMapJoinValueFilteredTableDescs.add(tableDesc);
+    }
+    mapJoinDesc.setValueFilteredTblDescs(newMapJoinValueFilteredTableDescs);
+
+    return null;
+  }
+
+  /**
+   * Generate an ExprNodeDesc that expresses the following method:
+   * JoinUtil#isFiltered(Object, List<ExprNodeEvaluator>, List<ObjectInspector>, int[]).
+   */
+  private ExprNodeDesc generateFilterTagExpression(int[] filterMap, List<ExprNodeDesc> filterExprs) {
+    ExprNodeDesc filterTagExpr = new ExprNodeConstantDesc(shortType, (short) 0);
+    Map<Byte, ExprNodeDesc> filterExprMap = getFilterExprMap(filterMap, filterExprs);
+
+    for (Map.Entry<Byte, ExprNodeDesc> entry: filterExprMap.entrySet()) {
+      ExprNodeDesc filterTagMaskExpr = generateFilterTagMask(entry.getKey(), entry.getValue());
+
+      if (filterTagExpr instanceof ExprNodeConstantDesc) {
+        filterTagExpr = filterTagMaskExpr;
+      } else {
+        List<ExprNodeDesc> plusArgs = new ArrayList<>(2);
+        plusArgs.add(filterTagMaskExpr);
+        plusArgs.add(filterTagExpr);
+        filterTagExpr = new ExprNodeGenericFuncDesc(shortType, new GenericUDFOPPlus(), plusArgs);
+      }
+    }
+
+    return filterTagExpr;
+  }
+
+  /**
+   * Group filterExprs by tag and merge each of them into a single boolean ExprNodeDesc using AND operator.
+   * filterInfo is repetition of tag and the length of corresponding filter expressions.
+   * For example, filterInfo = {0, 2, 1, 3} means that the first 2 elements in filterExprs belong to tag 0,
+   * and the remaining 3 elements belong to tag 1.
+   */
+  private Map<Byte, ExprNodeDesc> getFilterExprMap(int[] filterInfo, List<ExprNodeDesc> filterExprs) {
+    Map<Byte, ExprNodeDesc> filterExprMap = new HashMap<>();
+
+    int exprListOffset = 0;
+    for (int idx = 0; idx < filterInfo.length; idx = idx + 2) {
+      byte tag = (byte) filterInfo[idx];
+      int length = filterInfo[idx + 1];
+
+      int nextExprOffset = exprListOffset + length;
+      List<ExprNodeDesc> andArgs = filterExprs.subList(exprListOffset, nextExprOffset);
+      exprListOffset = nextExprOffset;
+
+      if (andArgs.size() == 1) {
+        filterExprMap.put(tag, andArgs.get(0));
+      } else if (andArgs.size() > 1) {
+        filterExprMap.put(tag, ExprNodeDescUtils.and(andArgs));
+      }
+    }
+
+    return filterExprMap;
+  }
+
+  /**
+   * Generate an ExprNodeDesc that expresses thw following code:
+   *   if (!condition) { return (short) (1 << tag) } else { return (short) 0; }.

Review Comment:
   Thanks for this clarification. This is a good example and we should keep it but I think it is also worth mentioning that the actual expression is something like
   ```
   ((short) !(<bool condition>)) << <tag>
   ```
   and we are exploiting that `UDFToShort(Boolean)` returns
   * `1` when the input logical expression is `true` 
   *  and `0` when `false`
   
   WDYT?



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/FiltertagAppenderProc.java:
##########
@@ -0,0 +1,231 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.optimizer;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Stack;
+
+import org.apache.hadoop.hive.ql.exec.ColumnInfo;
+import org.apache.hadoop.hive.ql.exec.MapJoinOperator;
+import org.apache.hadoop.hive.ql.exec.ReduceSinkOperator;
+import org.apache.hadoop.hive.ql.exec.Utilities;
+import org.apache.hadoop.hive.ql.lib.Node;
+import org.apache.hadoop.hive.ql.lib.NodeProcessorCtx;
+import org.apache.hadoop.hive.ql.lib.SemanticNodeProcessor;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDescUtils;
+import org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc;
+import org.apache.hadoop.hive.ql.plan.MapJoinDesc;
+import org.apache.hadoop.hive.ql.plan.PlanUtils;
+import org.apache.hadoop.hive.ql.plan.ReduceSinkDesc;
+import org.apache.hadoop.hive.ql.plan.TableDesc;
+import org.apache.hadoop.hive.ql.udf.UDFToShort;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPMultiply;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPNot;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPPlus;
+import org.apache.hadoop.hive.serde.serdeConstants;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
+
+/**
+ * Append a filterTag computation column to ReduceSinkOperators whose child is MapJoinOperator.
+ * The added column expresses JoinUtil#isFiltered in the form of ExprNode.
+ * The added column should be located at the end of row as CommonJoinOperator expects it.
+ *
+ * This processor only affects small tables of MapJoin that run on Tez engine.
+ * For big table, MapJoinOperator#process() calls CommonJoinOperator#getFilteredValue(), which adds filterTag.
+ * For MapReduce engine, HashTableSinkOperator adds filterTag to every row.
+ */
+public class FiltertagAppenderProc implements SemanticNodeProcessor {
+
+  private final TypeInfo shortType = TypeInfoFactory.getPrimitiveTypeInfo(serdeConstants.SMALLINT_TYPE_NAME);
+
+  @Override
+  public Object process(Node nd, Stack<Node> stack, NodeProcessorCtx procCtx, Object... nodeOutputs)
+      throws SemanticException {
+    MapJoinOperator mapJoinOp = (MapJoinOperator) nd;
+    MapJoinDesc mapJoinDesc = mapJoinOp.getConf();
+
+    if (mapJoinDesc.getFilterMap() == null) {
+      return null;
+    }
+
+    int[][] filterMap = mapJoinDesc.getFilterMap();
+
+    // 1. Extend ReduceSinkoperator if it's output is filtered by MapJoinOperator.
+    for (byte pos = 0; pos < filterMap.length; pos++) {
+      if (pos == mapJoinDesc.getPosBigTable() || filterMap[pos] == null) {
+        continue;
+      }
+
+      ExprNodeDesc filterTagExpr =
+          generateFilterTagExpression(filterMap[pos], mapJoinDesc.getFilters().get(pos));
+
+      // Note that the parent RS for the given pos is retrieved in different way in MapJoinProcessor.
+      // TODO: MapJoinProcessor.convertMapJoin() fixes the order of parent operators.
+      //  Does other callers also fix the order as well as MapJoinProcessor.convertMapJoin()?
+      ReduceSinkOperator parent = (ReduceSinkOperator) mapJoinOp.getParentOperators().get(pos);
+      ReduceSinkDesc pRsConf = parent.getConf();
+
+      // MapJoinProcessor.getMapJoinDesc() replaces filter expressions with backtracked one if
+      // adjustParentsChildren is true.
+      // As of now, ConvertJoinMapJoin.convertJoinDynamicPartitionedHashJoin() is the only functions that
+      // calls this method with adjustParentsChildren = false. Therefore, we backtrack filter expressions
+      // only if MapJoinDesc.isDynamicPartitionHashJoin is true, which is also the unique property of
+      // ConvertJoinMapJoin.convertJoinDynamicPartitionedHashJoin().
+      ExprNodeDesc mapSideFilterTagExpr;
+      if (mapJoinDesc.isDynamicPartitionHashJoin()) {
+        mapSideFilterTagExpr = ExprNodeDescUtils.backtrack(filterTagExpr, mapJoinOp, parent);
+      } else {
+        mapSideFilterTagExpr = filterTagExpr;
+      }
+      String filterColumnName = "_filterTag";
+
+      pRsConf.getValueCols().add(mapSideFilterTagExpr);
+      pRsConf.getOutputValueColumnNames().add(filterColumnName);
+      pRsConf.getColumnExprMap()
+          .put(Utilities.ReduceField.VALUE + "." + filterColumnName, mapSideFilterTagExpr);
+
+      ColumnInfo filterTagColumnInfo =
+          new ColumnInfo(Utilities.ReduceField.VALUE + "." + filterColumnName, shortType, "", false);
+      parent.getSchema().getSignature().add(filterTagColumnInfo);
+
+      TableDesc newTableDesc =
+          PlanUtils.getReduceValueTableDesc(
+              PlanUtils.getFieldSchemasFromColumnList(pRsConf.getValueCols(), "_col"));
+      pRsConf.setValueSerializeInfo(newTableDesc);
+    }
+
+    // 2. Update MapJoinOperator's valueFilteredTableDescs.
+    // Unlike HashTableSinkOperator used in MR engine, Tez engine directly passes rows from RS to MapJoin.
+    // Therefore, RS's writer and MapJoin's reader should have the same TableDesc. We create valueTableDesc
+    // here again because it can be different from RS's valueSerializeInfo due to ColumnPruner.
+    List<TableDesc> newMapJoinValueFilteredTableDescs = new ArrayList<>();
+    for (byte pos = 0; pos < mapJoinOp.getParentOperators().size(); pos++) {
+      TableDesc tableDesc;
+
+      if (pos == mapJoinDesc.getPosBigTable() || filterMap[pos] == null) {
+        // We did not change corresponding parent operator. Use the original tableDesc.
+        tableDesc = mapJoinDesc.getValueFilteredTblDescs().get(pos);
+      } else {
+        // Create a new TableDesc based on corresponding parent RSOperator.
+        ReduceSinkOperator parent = (ReduceSinkOperator) mapJoinOp.getParentOperators().get(pos);
+        ReduceSinkDesc pRsConf = parent.getConf();
+
+        tableDesc =
+            PlanUtils.getMapJoinValueTableDesc(
+                PlanUtils.getFieldSchemasFromColumnList(pRsConf.getValueCols(), "mapjoinvalue"));
+      }
+
+      newMapJoinValueFilteredTableDescs.add(tableDesc);
+    }
+    mapJoinDesc.setValueFilteredTblDescs(newMapJoinValueFilteredTableDescs);
+
+    return null;
+  }
+
+  /**
+   * Generate an ExprNodeDesc that expresses the following method:
+   * JoinUtil#isFiltered(Object, List<ExprNodeEvaluator>, List<ObjectInspector>, int[]).
+   */
+  private ExprNodeDesc generateFilterTagExpression(int[] filterMap, List<ExprNodeDesc> filterExprs) {
+    ExprNodeDesc filterTagExpr = new ExprNodeConstantDesc(shortType, (short) 0);
+    Map<Byte, ExprNodeDesc> filterExprMap = getFilterExprMap(filterMap, filterExprs);
+
+    for (Map.Entry<Byte, ExprNodeDesc> entry: filterExprMap.entrySet()) {
+      ExprNodeDesc filterTagMaskExpr = generateFilterTagMask(entry.getKey(), entry.getValue());
+
+      if (filterTagExpr instanceof ExprNodeConstantDesc) {
+        filterTagExpr = filterTagMaskExpr;
+      } else {
+        List<ExprNodeDesc> plusArgs = new ArrayList<>(2);
+        plusArgs.add(filterTagMaskExpr);
+        plusArgs.add(filterTagExpr);
+        filterTagExpr = new ExprNodeGenericFuncDesc(shortType, new GenericUDFOPPlus(), plusArgs);
+      }
+    }
+
+    return filterTagExpr;
+  }
+
+  /**
+   * Group filterExprs by tag and merge each of them into a single boolean ExprNodeDesc using AND operator.
+   * filterInfo is repetition of tag and the length of corresponding filter expressions.
+   * For example, filterInfo = {0, 2, 1, 3} means that the first 2 elements in filterExprs belong to tag 0,
+   * and the remaining 3 elements belong to tag 1.
+   */
+  private Map<Byte, ExprNodeDesc> getFilterExprMap(int[] filterInfo, List<ExprNodeDesc> filterExprs) {
+    Map<Byte, ExprNodeDesc> filterExprMap = new HashMap<>();
+
+    int exprListOffset = 0;
+    for (int idx = 0; idx < filterInfo.length; idx = idx + 2) {
+      byte tag = (byte) filterInfo[idx];
+      int length = filterInfo[idx + 1];
+
+      int nextExprOffset = exprListOffset + length;
+      List<ExprNodeDesc> andArgs = filterExprs.subList(exprListOffset, nextExprOffset);
+      exprListOffset = nextExprOffset;
+
+      if (andArgs.size() == 1) {
+        filterExprMap.put(tag, andArgs.get(0));
+      } else if (andArgs.size() > 1) {
+        filterExprMap.put(tag, ExprNodeDescUtils.and(andArgs));
+      }
+    }
+
+    return filterExprMap;
+  }
+
+  /**
+   * Generate an ExprNodeDesc that expresses thw following code:

Review Comment:
   typo: thw -> the



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/FiltertagAppenderProc.java:
##########
@@ -0,0 +1,231 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.optimizer;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Stack;
+
+import org.apache.hadoop.hive.ql.exec.ColumnInfo;
+import org.apache.hadoop.hive.ql.exec.MapJoinOperator;
+import org.apache.hadoop.hive.ql.exec.ReduceSinkOperator;
+import org.apache.hadoop.hive.ql.exec.Utilities;
+import org.apache.hadoop.hive.ql.lib.Node;
+import org.apache.hadoop.hive.ql.lib.NodeProcessorCtx;
+import org.apache.hadoop.hive.ql.lib.SemanticNodeProcessor;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDescUtils;
+import org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc;
+import org.apache.hadoop.hive.ql.plan.MapJoinDesc;
+import org.apache.hadoop.hive.ql.plan.PlanUtils;
+import org.apache.hadoop.hive.ql.plan.ReduceSinkDesc;
+import org.apache.hadoop.hive.ql.plan.TableDesc;
+import org.apache.hadoop.hive.ql.udf.UDFToShort;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPMultiply;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPNot;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPPlus;
+import org.apache.hadoop.hive.serde.serdeConstants;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
+
+/**
+ * Append a filterTag computation column to ReduceSinkOperators whose child is MapJoinOperator.
+ * The added column expresses JoinUtil#isFiltered in the form of ExprNode.
+ * The added column should be located at the end of row as CommonJoinOperator expects it.
+ *
+ * This processor only affects small tables of MapJoin that run on Tez engine.
+ * For big table, MapJoinOperator#process() calls CommonJoinOperator#getFilteredValue(), which adds filterTag.
+ * For MapReduce engine, HashTableSinkOperator adds filterTag to every row.
+ */
+public class FiltertagAppenderProc implements SemanticNodeProcessor {
+
+  private final TypeInfo shortType = TypeInfoFactory.getPrimitiveTypeInfo(serdeConstants.SMALLINT_TYPE_NAME);
+
+  @Override
+  public Object process(Node nd, Stack<Node> stack, NodeProcessorCtx procCtx, Object... nodeOutputs)
+      throws SemanticException {
+    MapJoinOperator mapJoinOp = (MapJoinOperator) nd;
+    MapJoinDesc mapJoinDesc = mapJoinOp.getConf();
+
+    if (mapJoinDesc.getFilterMap() == null) {
+      return null;
+    }
+
+    int[][] filterMap = mapJoinDesc.getFilterMap();
+
+    // 1. Extend ReduceSinkoperator if it's output is filtered by MapJoinOperator.
+    for (byte pos = 0; pos < filterMap.length; pos++) {
+      if (pos == mapJoinDesc.getPosBigTable() || filterMap[pos] == null) {
+        continue;
+      }
+
+      ExprNodeDesc filterTagExpr =
+          generateFilterTagExpression(filterMap[pos], mapJoinDesc.getFilters().get(pos));
+
+      // Note that the parent RS for the given pos is retrieved in different way in MapJoinProcessor.
+      // TODO: MapJoinProcessor.convertMapJoin() fixes the order of parent operators.
+      //  Does other callers also fix the order as well as MapJoinProcessor.convertMapJoin()?
+      ReduceSinkOperator parent = (ReduceSinkOperator) mapJoinOp.getParentOperators().get(pos);
+      ReduceSinkDesc pRsConf = parent.getConf();
+
+      // MapJoinProcessor.getMapJoinDesc() replaces filter expressions with backtracked one if
+      // adjustParentsChildren is true.
+      // As of now, ConvertJoinMapJoin.convertJoinDynamicPartitionedHashJoin() is the only functions that
+      // calls this method with adjustParentsChildren = false. Therefore, we backtrack filter expressions
+      // only if MapJoinDesc.isDynamicPartitionHashJoin is true, which is also the unique property of
+      // ConvertJoinMapJoin.convertJoinDynamicPartitionedHashJoin().
+      ExprNodeDesc mapSideFilterTagExpr;
+      if (mapJoinDesc.isDynamicPartitionHashJoin()) {
+        mapSideFilterTagExpr = ExprNodeDescUtils.backtrack(filterTagExpr, mapJoinOp, parent);
+      } else {
+        mapSideFilterTagExpr = filterTagExpr;
+      }
+      String filterColumnName = "_filterTag";
+
+      pRsConf.getValueCols().add(mapSideFilterTagExpr);
+      pRsConf.getOutputValueColumnNames().add(filterColumnName);
+      pRsConf.getColumnExprMap()
+          .put(Utilities.ReduceField.VALUE + "." + filterColumnName, mapSideFilterTagExpr);
+
+      ColumnInfo filterTagColumnInfo =
+          new ColumnInfo(Utilities.ReduceField.VALUE + "." + filterColumnName, shortType, "", false);
+      parent.getSchema().getSignature().add(filterTagColumnInfo);
+
+      TableDesc newTableDesc =
+          PlanUtils.getReduceValueTableDesc(
+              PlanUtils.getFieldSchemasFromColumnList(pRsConf.getValueCols(), "_col"));
+      pRsConf.setValueSerializeInfo(newTableDesc);
+    }
+
+    // 2. Update MapJoinOperator's valueFilteredTableDescs.
+    // Unlike HashTableSinkOperator used in MR engine, Tez engine directly passes rows from RS to MapJoin.
+    // Therefore, RS's writer and MapJoin's reader should have the same TableDesc. We create valueTableDesc
+    // here again because it can be different from RS's valueSerializeInfo due to ColumnPruner.
+    List<TableDesc> newMapJoinValueFilteredTableDescs = new ArrayList<>();
+    for (byte pos = 0; pos < mapJoinOp.getParentOperators().size(); pos++) {

Review Comment:
   nit. Do we know the size of this? How about
   ```
   List<TableDesc> newMapJoinValueFilteredTableDescs = new ArrayList<>(mapJoinOp.getParentOperators().size());
   ```



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/FiltertagAppenderProc.java:
##########
@@ -0,0 +1,231 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.optimizer;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Stack;
+
+import org.apache.hadoop.hive.ql.exec.ColumnInfo;
+import org.apache.hadoop.hive.ql.exec.MapJoinOperator;
+import org.apache.hadoop.hive.ql.exec.ReduceSinkOperator;
+import org.apache.hadoop.hive.ql.exec.Utilities;
+import org.apache.hadoop.hive.ql.lib.Node;
+import org.apache.hadoop.hive.ql.lib.NodeProcessorCtx;
+import org.apache.hadoop.hive.ql.lib.SemanticNodeProcessor;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDescUtils;
+import org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc;
+import org.apache.hadoop.hive.ql.plan.MapJoinDesc;
+import org.apache.hadoop.hive.ql.plan.PlanUtils;
+import org.apache.hadoop.hive.ql.plan.ReduceSinkDesc;
+import org.apache.hadoop.hive.ql.plan.TableDesc;
+import org.apache.hadoop.hive.ql.udf.UDFToShort;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPMultiply;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPNot;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPPlus;
+import org.apache.hadoop.hive.serde.serdeConstants;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
+
+/**
+ * Append a filterTag computation column to ReduceSinkOperators whose child is MapJoinOperator.
+ * The added column expresses JoinUtil#isFiltered in the form of ExprNode.
+ * The added column should be located at the end of row as CommonJoinOperator expects it.
+ *
+ * This processor only affects small tables of MapJoin that run on Tez engine.
+ * For big table, MapJoinOperator#process() calls CommonJoinOperator#getFilteredValue(), which adds filterTag.
+ * For MapReduce engine, HashTableSinkOperator adds filterTag to every row.
+ */
+public class FiltertagAppenderProc implements SemanticNodeProcessor {
+
+  private final TypeInfo shortType = TypeInfoFactory.getPrimitiveTypeInfo(serdeConstants.SMALLINT_TYPE_NAME);
+
+  @Override
+  public Object process(Node nd, Stack<Node> stack, NodeProcessorCtx procCtx, Object... nodeOutputs)
+      throws SemanticException {
+    MapJoinOperator mapJoinOp = (MapJoinOperator) nd;
+    MapJoinDesc mapJoinDesc = mapJoinOp.getConf();
+
+    if (mapJoinDesc.getFilterMap() == null) {
+      return null;
+    }
+
+    int[][] filterMap = mapJoinDesc.getFilterMap();
+
+    // 1. Extend ReduceSinkoperator if it's output is filtered by MapJoinOperator.
+    for (byte pos = 0; pos < filterMap.length; pos++) {
+      if (pos == mapJoinDesc.getPosBigTable() || filterMap[pos] == null) {
+        continue;
+      }
+
+      ExprNodeDesc filterTagExpr =
+          generateFilterTagExpression(filterMap[pos], mapJoinDesc.getFilters().get(pos));
+
+      // Note that the parent RS for the given pos is retrieved in different way in MapJoinProcessor.
+      // TODO: MapJoinProcessor.convertMapJoin() fixes the order of parent operators.
+      //  Does other callers also fix the order as well as MapJoinProcessor.convertMapJoin()?
+      ReduceSinkOperator parent = (ReduceSinkOperator) mapJoinOp.getParentOperators().get(pos);
+      ReduceSinkDesc pRsConf = parent.getConf();
+
+      // MapJoinProcessor.getMapJoinDesc() replaces filter expressions with backtracked one if
+      // adjustParentsChildren is true.
+      // As of now, ConvertJoinMapJoin.convertJoinDynamicPartitionedHashJoin() is the only functions that
+      // calls this method with adjustParentsChildren = false. Therefore, we backtrack filter expressions
+      // only if MapJoinDesc.isDynamicPartitionHashJoin is true, which is also the unique property of
+      // ConvertJoinMapJoin.convertJoinDynamicPartitionedHashJoin().
+      ExprNodeDesc mapSideFilterTagExpr;
+      if (mapJoinDesc.isDynamicPartitionHashJoin()) {
+        mapSideFilterTagExpr = ExprNodeDescUtils.backtrack(filterTagExpr, mapJoinOp, parent);
+      } else {
+        mapSideFilterTagExpr = filterTagExpr;
+      }
+      String filterColumnName = "_filterTag";
+
+      pRsConf.getValueCols().add(mapSideFilterTagExpr);
+      pRsConf.getOutputValueColumnNames().add(filterColumnName);
+      pRsConf.getColumnExprMap()
+          .put(Utilities.ReduceField.VALUE + "." + filterColumnName, mapSideFilterTagExpr);
+
+      ColumnInfo filterTagColumnInfo =
+          new ColumnInfo(Utilities.ReduceField.VALUE + "." + filterColumnName, shortType, "", false);
+      parent.getSchema().getSignature().add(filterTagColumnInfo);
+
+      TableDesc newTableDesc =
+          PlanUtils.getReduceValueTableDesc(
+              PlanUtils.getFieldSchemasFromColumnList(pRsConf.getValueCols(), "_col"));
+      pRsConf.setValueSerializeInfo(newTableDesc);
+    }
+
+    // 2. Update MapJoinOperator's valueFilteredTableDescs.
+    // Unlike HashTableSinkOperator used in MR engine, Tez engine directly passes rows from RS to MapJoin.
+    // Therefore, RS's writer and MapJoin's reader should have the same TableDesc. We create valueTableDesc
+    // here again because it can be different from RS's valueSerializeInfo due to ColumnPruner.
+    List<TableDesc> newMapJoinValueFilteredTableDescs = new ArrayList<>();
+    for (byte pos = 0; pos < mapJoinOp.getParentOperators().size(); pos++) {
+      TableDesc tableDesc;
+
+      if (pos == mapJoinDesc.getPosBigTable() || filterMap[pos] == null) {
+        // We did not change corresponding parent operator. Use the original tableDesc.
+        tableDesc = mapJoinDesc.getValueFilteredTblDescs().get(pos);
+      } else {
+        // Create a new TableDesc based on corresponding parent RSOperator.
+        ReduceSinkOperator parent = (ReduceSinkOperator) mapJoinOp.getParentOperators().get(pos);
+        ReduceSinkDesc pRsConf = parent.getConf();
+
+        tableDesc =
+            PlanUtils.getMapJoinValueTableDesc(
+                PlanUtils.getFieldSchemasFromColumnList(pRsConf.getValueCols(), "mapjoinvalue"));
+      }
+
+      newMapJoinValueFilteredTableDescs.add(tableDesc);
+    }
+    mapJoinDesc.setValueFilteredTblDescs(newMapJoinValueFilteredTableDescs);
+
+    return null;
+  }
+
+  /**
+   * Generate an ExprNodeDesc that expresses the following method:
+   * JoinUtil#isFiltered(Object, List<ExprNodeEvaluator>, List<ObjectInspector>, int[]).
+   */
+  private ExprNodeDesc generateFilterTagExpression(int[] filterMap, List<ExprNodeDesc> filterExprs) {
+    ExprNodeDesc filterTagExpr = new ExprNodeConstantDesc(shortType, (short) 0);
+    Map<Byte, ExprNodeDesc> filterExprMap = getFilterExprMap(filterMap, filterExprs);
+
+    for (Map.Entry<Byte, ExprNodeDesc> entry: filterExprMap.entrySet()) {
+      ExprNodeDesc filterTagMaskExpr = generateFilterTagMask(entry.getKey(), entry.getValue());
+
+      if (filterTagExpr instanceof ExprNodeConstantDesc) {
+        filterTagExpr = filterTagMaskExpr;
+      } else {
+        List<ExprNodeDesc> plusArgs = new ArrayList<>(2);
+        plusArgs.add(filterTagMaskExpr);
+        plusArgs.add(filterTagExpr);
+        filterTagExpr = new ExprNodeGenericFuncDesc(shortType, new GenericUDFOPPlus(), plusArgs);
+      }
+    }
+
+    return filterTagExpr;
+  }
+
+  /**
+   * Group filterExprs by tag and merge each of them into a single boolean ExprNodeDesc using AND operator.
+   * filterInfo is repetition of tag and the length of corresponding filter expressions.
+   * For example, filterInfo = {0, 2, 1, 3} means that the first 2 elements in filterExprs belong to tag 0,
+   * and the remaining 3 elements belong to tag 1.
+   */
+  private Map<Byte, ExprNodeDesc> getFilterExprMap(int[] filterInfo, List<ExprNodeDesc> filterExprs) {
+    Map<Byte, ExprNodeDesc> filterExprMap = new HashMap<>();
+
+    int exprListOffset = 0;
+    for (int idx = 0; idx < filterInfo.length; idx = idx + 2) {
+      byte tag = (byte) filterInfo[idx];
+      int length = filterInfo[idx + 1];
+
+      int nextExprOffset = exprListOffset + length;
+      List<ExprNodeDesc> andArgs = filterExprs.subList(exprListOffset, nextExprOffset);
+      exprListOffset = nextExprOffset;
+
+      if (andArgs.size() == 1) {
+        filterExprMap.put(tag, andArgs.get(0));
+      } else if (andArgs.size() > 1) {
+        filterExprMap.put(tag, ExprNodeDescUtils.and(andArgs));
+      }
+    }
+
+    return filterExprMap;
+  }
+
+  /**
+   * Generate an ExprNodeDesc that expresses thw following code:
+   *   if (!condition) { return (short) (1 << tag) } else { return (short) 0; }.
+   */
+  private ExprNodeDesc generateFilterTagMask(byte tag, ExprNodeDesc condition) {
+    ExprNodeDesc filterMaskValue = new ExprNodeConstantDesc(shortType, (short) (1 << tag));
+
+    List<ExprNodeDesc> negateArg = new ArrayList<>(1);
+    negateArg.add(condition);

Review Comment:
   nit.: can be simplified
   ```
   List<ExprNodeDesc> negateArg = Collections.singletonList(condition);
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org

[GitHub] [hive] ayushtkn commented on pull request #4115: HIVE-27138: Extend RSOp to compute filterTag if it has child MapJoinOp

Posted by "ayushtkn (via GitHub)" <gi...@apache.org>.

ayushtkn commented on PR #4115:
URL: https://github.com/apache/hive/pull/4115#issuecomment-1699154374

   @amansinha100 do you have any pointers around this, can you help check.
   cc. @aturoczy 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org

[GitHub] [hive] kasakrisz commented on a diff in pull request #4115: HIVE-27138: Extend RSOp to compute filterTag if it has child MapJoinOp

Posted by "kasakrisz (via GitHub)" <gi...@apache.org>.

kasakrisz commented on code in PR #4115:
URL: https://github.com/apache/hive/pull/4115#discussion_r1317050773


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ExtendParentReduceSinkOfMapJoin.java:
##########
@@ -0,0 +1,205 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.optimizer;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Stack;
+
+import org.apache.hadoop.hive.ql.exec.ColumnInfo;
+import org.apache.hadoop.hive.ql.exec.MapJoinOperator;
+import org.apache.hadoop.hive.ql.exec.ReduceSinkOperator;
+import org.apache.hadoop.hive.ql.exec.Utilities;
+import org.apache.hadoop.hive.ql.lib.Node;
+import org.apache.hadoop.hive.ql.lib.NodeProcessorCtx;
+import org.apache.hadoop.hive.ql.lib.SemanticNodeProcessor;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDescUtils;
+import org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc;
+import org.apache.hadoop.hive.ql.plan.MapJoinDesc;
+import org.apache.hadoop.hive.ql.plan.PlanUtils;
+import org.apache.hadoop.hive.ql.plan.ReduceSinkDesc;
+import org.apache.hadoop.hive.ql.plan.TableDesc;
+import org.apache.hadoop.hive.ql.udf.UDFToShort;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPMultiply;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPNot;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPPlus;
+import org.apache.hadoop.hive.serde.serdeConstants;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
+
+public class ExtendParentReduceSinkOfMapJoin implements SemanticNodeProcessor {
+
+  private final TypeInfo shortType = TypeInfoFactory.getPrimitiveTypeInfo(serdeConstants.SMALLINT_TYPE_NAME);
+
+  @Override
+  public Object process(Node nd, Stack<Node> stack, NodeProcessorCtx procCtx, Object... nodeOutputs)
+      throws SemanticException {
+    MapJoinOperator mapJoinOp = (MapJoinOperator) nd;
+    MapJoinDesc mapJoinDesc = mapJoinOp.getConf();
+
+    if (mapJoinDesc.getFilterMap() != null) {
+      int[][] filterMap = mapJoinDesc.getFilterMap();
+
+      // 1. Extend ReduceSinkoperator if it's output is filtered by MapJoinOperator.
+      for (byte pos = 0; pos < filterMap.length; pos++) {
+        if (pos == mapJoinDesc.getPosBigTable() || filterMap[pos] == null) {
+          continue;
+        }
+
+        ExprNodeDesc filterTagExpr =
+            generateFilterTagExpression(filterMap[pos], mapJoinDesc.getFilters().get(pos));
+
+        // Note that the parent RS for the given pos is retrieved in different way in MapJoinProcessor.
+        // TODO: MapJoinProcessor.convertMapJoin() fixes the order of parent operators.
+        //  Does other callers also fix the order as well as MapJoinProcessor.convertMapJoin()?
+        ReduceSinkOperator parent = (ReduceSinkOperator) mapJoinOp.getParentOperators().get(pos);
+        ReduceSinkDesc pRsConf = parent.getConf();
+
+        // MapJoinProcessor.getMapJoinDesc() replaces filter expressions with backtracked one if
+        // adjustParentsChildren is true.
+        // As of now, ConvertJoinMapJoin.convertJoinDynamicPartitionedHashJoin() is the only functions that
+        // calls this method with adjustParentsChildren = false. Therefore, we backtrack filter expressions
+        // only if MapJoinDesc.isDynamicPartitionHashJoin is true, which is also the unique property of
+        // ConvertJoinMapJoin.convertJoinDynamicPartitionedHashJoin().
+        ExprNodeDesc mapSideFilterTagExpr;
+        if (mapJoinDesc.isDynamicPartitionHashJoin()) {
+          mapSideFilterTagExpr = ExprNodeDescUtils.backtrack(filterTagExpr, mapJoinOp, parent);
+        } else {
+          mapSideFilterTagExpr = filterTagExpr;
+        }
+        String filterColumnName = "_filterTag";
+
+        pRsConf.getValueCols().add(mapSideFilterTagExpr);
+        pRsConf.getOutputValueColumnNames().add(filterColumnName);
+        pRsConf.getColumnExprMap()
+            .put(Utilities.ReduceField.VALUE + "." + filterColumnName, mapSideFilterTagExpr);
+
+        ColumnInfo filterTagColumnInfo =
+            new ColumnInfo(Utilities.ReduceField.VALUE + "." + filterColumnName, shortType, "", false);
+        parent.getSchema().getSignature().add(filterTagColumnInfo);
+
+        TableDesc newTableDesc =
+            PlanUtils.getReduceValueTableDesc(
+                PlanUtils.getFieldSchemasFromColumnList(pRsConf.getValueCols(), "_col"));
+        pRsConf.setValueSerializeInfo(newTableDesc);
+      }
+
+      // 2. Update MapJoinOperator's valueFilteredTableDescs.
+      // Unlike HashTableSinkOperator used in MR engine, Tez engine directly passes rows from RS to MapJoin.
+      // Therefore, RS's writer and MapJoin's reader should have the same TableDesc. We create valueTableDesc
+      // here again because it can be different from RS's valueSerializeInfo due to ColumnPruner.
+      List<TableDesc> newMapJoinValueFilteredTableDescs = new ArrayList<>();
+      for (byte pos = 0; pos < mapJoinOp.getParentOperators().size(); pos++) {
+        TableDesc tableDesc;
+
+        if (pos == mapJoinDesc.getPosBigTable() || filterMap[pos] == null) {
+          // We did not change corresponding parent operator. Use the original tableDesc.
+          tableDesc = mapJoinDesc.getValueFilteredTblDescs().get(pos);
+        } else {
+          // Create a new TableDesc based on corresponding parent RSOperator.
+          ReduceSinkOperator parent = (ReduceSinkOperator) mapJoinOp.getParentOperators().get(pos);
+          ReduceSinkDesc pRsConf = parent.getConf();
+
+          tableDesc =
+              PlanUtils.getMapJoinValueTableDesc(
+                  PlanUtils.getFieldSchemasFromColumnList(pRsConf.getValueCols(), "mapjoinvalue"));
+        }
+
+        newMapJoinValueFilteredTableDescs.add(tableDesc);
+      }
+      mapJoinDesc.setValueFilteredTblDescs(newMapJoinValueFilteredTableDescs);
+    }
+
+    return null;
+  }
+
+  private ExprNodeDesc generateFilterTagExpression(int[] filterMap, List<ExprNodeDesc> filterExprs) {
+    ExprNodeDesc filterTagExpr = new ExprNodeConstantDesc(shortType, (short) 0);
+    Map<Byte, ExprNodeDesc> filterExprMap = getFilterExprMap(filterMap, filterExprs);
+
+    for (Map.Entry<Byte, ExprNodeDesc> entry: filterExprMap.entrySet()) {
+      ExprNodeDesc filterTagMaskExpr = generateFilterTagMask(entry.getKey(), entry.getValue());
+
+      if (filterTagExpr instanceof ExprNodeConstantDesc) {
+        filterTagExpr = filterTagMaskExpr;
+      } else {
+        List<ExprNodeDesc> plusArgs = new ArrayList<>(2);
+        plusArgs.add(filterTagMaskExpr);
+        plusArgs.add(filterTagExpr);
+        filterTagExpr = new ExprNodeGenericFuncDesc(shortType, new GenericUDFOPPlus(), plusArgs);
+      }
+    }
+
+    return filterTagExpr;
+  }
+
+  private Map<Byte, ExprNodeDesc> getFilterExprMap(int[] filterInfo, List<ExprNodeDesc> filterExprs) {
+    Map<Byte, ExprNodeDesc> filterExprMap = new HashMap<>();
+
+    int exprListOffset = 0;
+    for (int idx = 0; idx < filterInfo.length; idx = idx + 2) {
+      byte tag = (byte) filterInfo[idx];
+      int length = filterInfo[idx + 1];
+
+      int nextExprOffset = exprListOffset + length;
+      List<ExprNodeDesc> andArgs = filterExprs.subList(exprListOffset, nextExprOffset);
+      exprListOffset = nextExprOffset;
+
+      if (andArgs.size() == 1) {
+        filterExprMap.put(tag, andArgs.get(0));
+      } else if (andArgs.size() > 1) {
+        filterExprMap.put(tag, ExprNodeDescUtils.and(andArgs));
+      }
+    }
+
+    return filterExprMap;
+  }
+
+  private ExprNodeDesc generateFilterTagMask(byte tag, ExprNodeDesc condition) {

Review Comment:
   I tried to understand how`JoinUtil.java.isFiltered` works.
   IIUC `0` means the row can be forwarded. A value greater than `0` means it should be filtered out.
   The mask is used to represent by which expressions were false.
   Is that correct?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org

[GitHub] [hive] sonarcloud[bot] commented on pull request #4115: HIVE-27138: Extend RSOp to compute filterTag if it has child MapJoinOp

Posted by "sonarcloud[bot] (via GitHub)" <gi...@apache.org>.

sonarcloud[bot] commented on PR #4115:
URL: https://github.com/apache/hive/pull/4115#issuecomment-1708401812

   Kudos, SonarCloud Quality Gate passed!&nbsp; &nbsp; [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive&pullRequest=4115)
   
   [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=BUG)  
   [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=VULNERABILITY)  
   [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4115&resolved=false&types=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4115&resolved=false&types=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4115&resolved=false&types=SECURITY_HOTSPOT)  
   [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=CODE_SMELL) [10 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=CODE_SMELL)
   
   [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4115&metric=coverage&view=list) No Coverage information  
   [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4115&metric=duplicated_lines_density&view=list) No Duplication information
   
   ![warning](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/message_warning-16px.png 'warning') The version of Java (11.0.8) you have used to run this analysis is deprecated and we will stop accepting it soon. Please update to at least Java 17.
   Read more [here](https://docs.sonarcloud.io/appendices/scanner-environment/)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org

[GitHub] [hive] ngsg commented on a diff in pull request #4115: HIVE-27138: Extend RSOp to compute filterTag if it has child MapJoinOp

Posted by "ngsg (via GitHub)" <gi...@apache.org>.

ngsg commented on code in PR #4115:
URL: https://github.com/apache/hive/pull/4115#discussion_r1316781474


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ExtendParentReduceSinkOfMapJoin.java:
##########
@@ -0,0 +1,205 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.optimizer;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Stack;
+
+import org.apache.hadoop.hive.ql.exec.ColumnInfo;
+import org.apache.hadoop.hive.ql.exec.MapJoinOperator;
+import org.apache.hadoop.hive.ql.exec.ReduceSinkOperator;
+import org.apache.hadoop.hive.ql.exec.Utilities;
+import org.apache.hadoop.hive.ql.lib.Node;
+import org.apache.hadoop.hive.ql.lib.NodeProcessorCtx;
+import org.apache.hadoop.hive.ql.lib.SemanticNodeProcessor;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDescUtils;
+import org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc;
+import org.apache.hadoop.hive.ql.plan.MapJoinDesc;
+import org.apache.hadoop.hive.ql.plan.PlanUtils;
+import org.apache.hadoop.hive.ql.plan.ReduceSinkDesc;
+import org.apache.hadoop.hive.ql.plan.TableDesc;
+import org.apache.hadoop.hive.ql.udf.UDFToShort;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPMultiply;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPNot;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPPlus;
+import org.apache.hadoop.hive.serde.serdeConstants;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
+
+public class ExtendParentReduceSinkOfMapJoin implements SemanticNodeProcessor {
+
+  private final TypeInfo shortType = TypeInfoFactory.getPrimitiveTypeInfo(serdeConstants.SMALLINT_TYPE_NAME);
+
+  @Override
+  public Object process(Node nd, Stack<Node> stack, NodeProcessorCtx procCtx, Object... nodeOutputs)
+      throws SemanticException {
+    MapJoinOperator mapJoinOp = (MapJoinOperator) nd;
+    MapJoinDesc mapJoinDesc = mapJoinOp.getConf();
+
+    if (mapJoinDesc.getFilterMap() != null) {

Review Comment:
   OK, I inverted the condition and moved the content of if-clause to outside of if.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org

[GitHub] [hive] sonarcloud[bot] commented on pull request #4115: HIVE-27138: Extend RSOp to compute filterTag if it has child MapJoinOp

Posted by "sonarcloud[bot] (via GitHub)" <gi...@apache.org>.

sonarcloud[bot] commented on PR #4115:
URL: https://github.com/apache/hive/pull/4115#issuecomment-1716076053

   Kudos, SonarCloud Quality Gate passed!&nbsp; &nbsp; [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive&pullRequest=4115)
   
   [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=BUG)  
   [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=VULNERABILITY)  
   [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4115&resolved=false&types=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4115&resolved=false&types=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4115&resolved=false&types=SECURITY_HOTSPOT)  
   [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=CODE_SMELL) [10 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=CODE_SMELL)
   
   [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4115&metric=coverage&view=list) No Coverage information  
   [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4115&metric=duplicated_lines_density&view=list) No Duplication information
   
   ![warning](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/message_warning-16px.png 'warning') The version of Java (11.0.8) you have used to run this analysis is deprecated and we will stop accepting it soon. Please update to at least Java 17.
   Read more [here](https://docs.sonarcloud.io/appendices/scanner-environment/)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org

[GitHub] [hive] sonarcloud[bot] commented on pull request #4115: HIVE-27138: Extend RSOp to compute filterTag if it has child MapJoinOp

Posted by "sonarcloud[bot] (via GitHub)" <gi...@apache.org>.

sonarcloud[bot] commented on PR #4115:
URL: https://github.com/apache/hive/pull/4115#issuecomment-1654678889

   Kudos, SonarCloud Quality Gate passed!&nbsp; &nbsp; [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive&pullRequest=4115)
   
   [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=BUG)  
   [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=VULNERABILITY)  
   [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4115&resolved=false&types=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4115&resolved=false&types=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4115&resolved=false&types=SECURITY_HOTSPOT)  
   [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=CODE_SMELL) [19 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=CODE_SMELL)
   
   [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4115&metric=coverage&view=list) No Coverage information  
   [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4115&metric=duplicated_lines_density&view=list) No Duplication information
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org

[GitHub] [hive] ngsg commented on a diff in pull request #4115: HIVE-27138: Extend RSOp to compute filterTag if it has child MapJoinOp

Posted by "ngsg (via GitHub)" <gi...@apache.org>.

ngsg commented on code in PR #4115:
URL: https://github.com/apache/hive/pull/4115#discussion_r1316779528


##########
serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryStruct.java:
##########
@@ -221,7 +221,7 @@ public void init(BinaryComparable src) {
       fieldBytes = src.getBytes();
       int length = src.getLength();
       byte nullByte = fieldBytes[0];
-      int lastFieldByteEnd = 1, fieldStart = -1, fieldLength = -1;

Review Comment:
   fieldStart and fieldLength are not intented to be used as local variable; they should be set to proper values in init() so that the caller of getShort() retrieves correct field from the BinaryStruct.
   I split the field declaration into separate lines.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org

[GitHub] [hive] ngsg commented on a diff in pull request #4115: HIVE-27138: Extend RSOp to compute filterTag if it has child MapJoinOp

Posted by "ngsg (via GitHub)" <gi...@apache.org>.

ngsg commented on code in PR #4115:
URL: https://github.com/apache/hive/pull/4115#discussion_r1316783464


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ExtendParentReduceSinkOfMapJoin.java:
##########
@@ -0,0 +1,205 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.optimizer;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Stack;
+
+import org.apache.hadoop.hive.ql.exec.ColumnInfo;
+import org.apache.hadoop.hive.ql.exec.MapJoinOperator;
+import org.apache.hadoop.hive.ql.exec.ReduceSinkOperator;
+import org.apache.hadoop.hive.ql.exec.Utilities;
+import org.apache.hadoop.hive.ql.lib.Node;
+import org.apache.hadoop.hive.ql.lib.NodeProcessorCtx;
+import org.apache.hadoop.hive.ql.lib.SemanticNodeProcessor;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDescUtils;
+import org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc;
+import org.apache.hadoop.hive.ql.plan.MapJoinDesc;
+import org.apache.hadoop.hive.ql.plan.PlanUtils;
+import org.apache.hadoop.hive.ql.plan.ReduceSinkDesc;
+import org.apache.hadoop.hive.ql.plan.TableDesc;
+import org.apache.hadoop.hive.ql.udf.UDFToShort;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPMultiply;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPNot;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPPlus;
+import org.apache.hadoop.hive.serde.serdeConstants;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
+
+public class ExtendParentReduceSinkOfMapJoin implements SemanticNodeProcessor {
+
+  private final TypeInfo shortType = TypeInfoFactory.getPrimitiveTypeInfo(serdeConstants.SMALLINT_TYPE_NAME);
+
+  @Override
+  public Object process(Node nd, Stack<Node> stack, NodeProcessorCtx procCtx, Object... nodeOutputs)
+      throws SemanticException {
+    MapJoinOperator mapJoinOp = (MapJoinOperator) nd;
+    MapJoinDesc mapJoinDesc = mapJoinOp.getConf();
+
+    if (mapJoinDesc.getFilterMap() != null) {
+      int[][] filterMap = mapJoinDesc.getFilterMap();
+
+      // 1. Extend ReduceSinkoperator if it's output is filtered by MapJoinOperator.
+      for (byte pos = 0; pos < filterMap.length; pos++) {
+        if (pos == mapJoinDesc.getPosBigTable() || filterMap[pos] == null) {
+          continue;
+        }
+
+        ExprNodeDesc filterTagExpr =
+            generateFilterTagExpression(filterMap[pos], mapJoinDesc.getFilters().get(pos));
+
+        // Note that the parent RS for the given pos is retrieved in different way in MapJoinProcessor.
+        // TODO: MapJoinProcessor.convertMapJoin() fixes the order of parent operators.
+        //  Does other callers also fix the order as well as MapJoinProcessor.convertMapJoin()?
+        ReduceSinkOperator parent = (ReduceSinkOperator) mapJoinOp.getParentOperators().get(pos);
+        ReduceSinkDesc pRsConf = parent.getConf();
+
+        // MapJoinProcessor.getMapJoinDesc() replaces filter expressions with backtracked one if
+        // adjustParentsChildren is true.
+        // As of now, ConvertJoinMapJoin.convertJoinDynamicPartitionedHashJoin() is the only functions that
+        // calls this method with adjustParentsChildren = false. Therefore, we backtrack filter expressions
+        // only if MapJoinDesc.isDynamicPartitionHashJoin is true, which is also the unique property of
+        // ConvertJoinMapJoin.convertJoinDynamicPartitionedHashJoin().
+        ExprNodeDesc mapSideFilterTagExpr;
+        if (mapJoinDesc.isDynamicPartitionHashJoin()) {
+          mapSideFilterTagExpr = ExprNodeDescUtils.backtrack(filterTagExpr, mapJoinOp, parent);
+        } else {
+          mapSideFilterTagExpr = filterTagExpr;
+        }
+        String filterColumnName = "_filterTag";
+
+        pRsConf.getValueCols().add(mapSideFilterTagExpr);
+        pRsConf.getOutputValueColumnNames().add(filterColumnName);
+        pRsConf.getColumnExprMap()
+            .put(Utilities.ReduceField.VALUE + "." + filterColumnName, mapSideFilterTagExpr);
+
+        ColumnInfo filterTagColumnInfo =
+            new ColumnInfo(Utilities.ReduceField.VALUE + "." + filterColumnName, shortType, "", false);
+        parent.getSchema().getSignature().add(filterTagColumnInfo);
+
+        TableDesc newTableDesc =
+            PlanUtils.getReduceValueTableDesc(
+                PlanUtils.getFieldSchemasFromColumnList(pRsConf.getValueCols(), "_col"));
+        pRsConf.setValueSerializeInfo(newTableDesc);
+      }
+
+      // 2. Update MapJoinOperator's valueFilteredTableDescs.
+      // Unlike HashTableSinkOperator used in MR engine, Tez engine directly passes rows from RS to MapJoin.
+      // Therefore, RS's writer and MapJoin's reader should have the same TableDesc. We create valueTableDesc
+      // here again because it can be different from RS's valueSerializeInfo due to ColumnPruner.
+      List<TableDesc> newMapJoinValueFilteredTableDescs = new ArrayList<>();
+      for (byte pos = 0; pos < mapJoinOp.getParentOperators().size(); pos++) {
+        TableDesc tableDesc;
+
+        if (pos == mapJoinDesc.getPosBigTable() || filterMap[pos] == null) {
+          // We did not change corresponding parent operator. Use the original tableDesc.
+          tableDesc = mapJoinDesc.getValueFilteredTblDescs().get(pos);
+        } else {
+          // Create a new TableDesc based on corresponding parent RSOperator.
+          ReduceSinkOperator parent = (ReduceSinkOperator) mapJoinOp.getParentOperators().get(pos);
+          ReduceSinkDesc pRsConf = parent.getConf();
+
+          tableDesc =
+              PlanUtils.getMapJoinValueTableDesc(
+                  PlanUtils.getFieldSchemasFromColumnList(pRsConf.getValueCols(), "mapjoinvalue"));
+        }
+
+        newMapJoinValueFilteredTableDescs.add(tableDesc);
+      }
+      mapJoinDesc.setValueFilteredTblDescs(newMapJoinValueFilteredTableDescs);
+    }
+
+    return null;
+  }
+
+  private ExprNodeDesc generateFilterTagExpression(int[] filterMap, List<ExprNodeDesc> filterExprs) {
+    ExprNodeDesc filterTagExpr = new ExprNodeConstantDesc(shortType, (short) 0);
+    Map<Byte, ExprNodeDesc> filterExprMap = getFilterExprMap(filterMap, filterExprs);
+
+    for (Map.Entry<Byte, ExprNodeDesc> entry: filterExprMap.entrySet()) {
+      ExprNodeDesc filterTagMaskExpr = generateFilterTagMask(entry.getKey(), entry.getValue());
+
+      if (filterTagExpr instanceof ExprNodeConstantDesc) {
+        filterTagExpr = filterTagMaskExpr;
+      } else {
+        List<ExprNodeDesc> plusArgs = new ArrayList<>(2);
+        plusArgs.add(filterTagMaskExpr);
+        plusArgs.add(filterTagExpr);
+        filterTagExpr = new ExprNodeGenericFuncDesc(shortType, new GenericUDFOPPlus(), plusArgs);
+      }
+    }
+
+    return filterTagExpr;
+  }
+
+  private Map<Byte, ExprNodeDesc> getFilterExprMap(int[] filterInfo, List<ExprNodeDesc> filterExprs) {
+    Map<Byte, ExprNodeDesc> filterExprMap = new HashMap<>();
+
+    int exprListOffset = 0;
+    for (int idx = 0; idx < filterInfo.length; idx = idx + 2) {
+      byte tag = (byte) filterInfo[idx];
+      int length = filterInfo[idx + 1];
+
+      int nextExprOffset = exprListOffset + length;
+      List<ExprNodeDesc> andArgs = filterExprs.subList(exprListOffset, nextExprOffset);
+      exprListOffset = nextExprOffset;
+
+      if (andArgs.size() == 1) {
+        filterExprMap.put(tag, andArgs.get(0));
+      } else if (andArgs.size() > 1) {
+        filterExprMap.put(tag, ExprNodeDescUtils.and(andArgs));
+      }
+    }
+
+    return filterExprMap;
+  }
+
+  private ExprNodeDesc generateFilterTagMask(byte tag, ExprNodeDesc condition) {

Review Comment:
   I added a comment that explains the computation performed by the resultant ExprNodeDesc.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org

[GitHub] [hive] ngsg commented on a diff in pull request #4115: HIVE-27138: Extend RSOp to compute filterTag if it has child MapJoinOp

Posted by "ngsg (via GitHub)" <gi...@apache.org>.

ngsg commented on code in PR #4115:
URL: https://github.com/apache/hive/pull/4115#discussion_r1317153361


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/FiltertagAppenderProc.java:
##########
@@ -0,0 +1,231 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.optimizer;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Stack;
+
+import org.apache.hadoop.hive.ql.exec.ColumnInfo;
+import org.apache.hadoop.hive.ql.exec.MapJoinOperator;
+import org.apache.hadoop.hive.ql.exec.ReduceSinkOperator;
+import org.apache.hadoop.hive.ql.exec.Utilities;
+import org.apache.hadoop.hive.ql.lib.Node;
+import org.apache.hadoop.hive.ql.lib.NodeProcessorCtx;
+import org.apache.hadoop.hive.ql.lib.SemanticNodeProcessor;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDescUtils;
+import org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc;
+import org.apache.hadoop.hive.ql.plan.MapJoinDesc;
+import org.apache.hadoop.hive.ql.plan.PlanUtils;
+import org.apache.hadoop.hive.ql.plan.ReduceSinkDesc;
+import org.apache.hadoop.hive.ql.plan.TableDesc;
+import org.apache.hadoop.hive.ql.udf.UDFToShort;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPMultiply;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPNot;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPPlus;
+import org.apache.hadoop.hive.serde.serdeConstants;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
+
+/**
+ * Append a filterTag computation column to ReduceSinkOperators whose child is MapJoinOperator.
+ * The added column expresses JoinUtil#isFiltered in the form of ExprNode.
+ * The added column should be located at the end of row as CommonJoinOperator expects it.
+ *
+ * This processor only affects small tables of MapJoin that run on Tez engine.
+ * For big table, MapJoinOperator#process() calls CommonJoinOperator#getFilteredValue(), which adds filterTag.
+ * For MapReduce engine, HashTableSinkOperator adds filterTag to every row.
+ */
+public class FiltertagAppenderProc implements SemanticNodeProcessor {
+
+  private final TypeInfo shortType = TypeInfoFactory.getPrimitiveTypeInfo(serdeConstants.SMALLINT_TYPE_NAME);
+
+  @Override
+  public Object process(Node nd, Stack<Node> stack, NodeProcessorCtx procCtx, Object... nodeOutputs)
+      throws SemanticException {
+    MapJoinOperator mapJoinOp = (MapJoinOperator) nd;
+    MapJoinDesc mapJoinDesc = mapJoinOp.getConf();
+
+    if (mapJoinDesc.getFilterMap() == null) {
+      return null;
+    }
+
+    int[][] filterMap = mapJoinDesc.getFilterMap();
+
+    // 1. Extend ReduceSinkoperator if it's output is filtered by MapJoinOperator.
+    for (byte pos = 0; pos < filterMap.length; pos++) {
+      if (pos == mapJoinDesc.getPosBigTable() || filterMap[pos] == null) {
+        continue;
+      }
+
+      ExprNodeDesc filterTagExpr =
+          generateFilterTagExpression(filterMap[pos], mapJoinDesc.getFilters().get(pos));
+
+      // Note that the parent RS for the given pos is retrieved in different way in MapJoinProcessor.
+      // TODO: MapJoinProcessor.convertMapJoin() fixes the order of parent operators.
+      //  Does other callers also fix the order as well as MapJoinProcessor.convertMapJoin()?
+      ReduceSinkOperator parent = (ReduceSinkOperator) mapJoinOp.getParentOperators().get(pos);
+      ReduceSinkDesc pRsConf = parent.getConf();
+
+      // MapJoinProcessor.getMapJoinDesc() replaces filter expressions with backtracked one if
+      // adjustParentsChildren is true.
+      // As of now, ConvertJoinMapJoin.convertJoinDynamicPartitionedHashJoin() is the only functions that
+      // calls this method with adjustParentsChildren = false. Therefore, we backtrack filter expressions
+      // only if MapJoinDesc.isDynamicPartitionHashJoin is true, which is also the unique property of
+      // ConvertJoinMapJoin.convertJoinDynamicPartitionedHashJoin().
+      ExprNodeDesc mapSideFilterTagExpr;
+      if (mapJoinDesc.isDynamicPartitionHashJoin()) {
+        mapSideFilterTagExpr = ExprNodeDescUtils.backtrack(filterTagExpr, mapJoinOp, parent);
+      } else {
+        mapSideFilterTagExpr = filterTagExpr;
+      }
+      String filterColumnName = "_filterTag";
+
+      pRsConf.getValueCols().add(mapSideFilterTagExpr);
+      pRsConf.getOutputValueColumnNames().add(filterColumnName);
+      pRsConf.getColumnExprMap()
+          .put(Utilities.ReduceField.VALUE + "." + filterColumnName, mapSideFilterTagExpr);
+
+      ColumnInfo filterTagColumnInfo =
+          new ColumnInfo(Utilities.ReduceField.VALUE + "." + filterColumnName, shortType, "", false);
+      parent.getSchema().getSignature().add(filterTagColumnInfo);
+
+      TableDesc newTableDesc =
+          PlanUtils.getReduceValueTableDesc(
+              PlanUtils.getFieldSchemasFromColumnList(pRsConf.getValueCols(), "_col"));
+      pRsConf.setValueSerializeInfo(newTableDesc);
+    }
+
+    // 2. Update MapJoinOperator's valueFilteredTableDescs.
+    // Unlike HashTableSinkOperator used in MR engine, Tez engine directly passes rows from RS to MapJoin.
+    // Therefore, RS's writer and MapJoin's reader should have the same TableDesc. We create valueTableDesc
+    // here again because it can be different from RS's valueSerializeInfo due to ColumnPruner.
+    List<TableDesc> newMapJoinValueFilteredTableDescs = new ArrayList<>();
+    for (byte pos = 0; pos < mapJoinOp.getParentOperators().size(); pos++) {
+      TableDesc tableDesc;
+
+      if (pos == mapJoinDesc.getPosBigTable() || filterMap[pos] == null) {
+        // We did not change corresponding parent operator. Use the original tableDesc.
+        tableDesc = mapJoinDesc.getValueFilteredTblDescs().get(pos);
+      } else {
+        // Create a new TableDesc based on corresponding parent RSOperator.
+        ReduceSinkOperator parent = (ReduceSinkOperator) mapJoinOp.getParentOperators().get(pos);
+        ReduceSinkDesc pRsConf = parent.getConf();
+
+        tableDesc =
+            PlanUtils.getMapJoinValueTableDesc(
+                PlanUtils.getFieldSchemasFromColumnList(pRsConf.getValueCols(), "mapjoinvalue"));
+      }
+
+      newMapJoinValueFilteredTableDescs.add(tableDesc);
+    }
+    mapJoinDesc.setValueFilteredTblDescs(newMapJoinValueFilteredTableDescs);
+
+    return null;
+  }
+
+  /**
+   * Generate an ExprNodeDesc that expresses the following method:
+   * JoinUtil#isFiltered(Object, List<ExprNodeEvaluator>, List<ObjectInspector>, int[]).
+   */
+  private ExprNodeDesc generateFilterTagExpression(int[] filterMap, List<ExprNodeDesc> filterExprs) {
+    ExprNodeDesc filterTagExpr = new ExprNodeConstantDesc(shortType, (short) 0);
+    Map<Byte, ExprNodeDesc> filterExprMap = getFilterExprMap(filterMap, filterExprs);
+
+    for (Map.Entry<Byte, ExprNodeDesc> entry: filterExprMap.entrySet()) {
+      ExprNodeDesc filterTagMaskExpr = generateFilterTagMask(entry.getKey(), entry.getValue());
+
+      if (filterTagExpr instanceof ExprNodeConstantDesc) {
+        filterTagExpr = filterTagMaskExpr;
+      } else {
+        List<ExprNodeDesc> plusArgs = new ArrayList<>(2);
+        plusArgs.add(filterTagMaskExpr);
+        plusArgs.add(filterTagExpr);
+        filterTagExpr = new ExprNodeGenericFuncDesc(shortType, new GenericUDFOPPlus(), plusArgs);
+      }
+    }
+
+    return filterTagExpr;
+  }
+
+  /**
+   * Group filterExprs by tag and merge each of them into a single boolean ExprNodeDesc using AND operator.
+   * filterInfo is repetition of tag and the length of corresponding filter expressions.
+   * For example, filterInfo = {0, 2, 1, 3} means that the first 2 elements in filterExprs belong to tag 0,
+   * and the remaining 3 elements belong to tag 1.
+   */
+  private Map<Byte, ExprNodeDesc> getFilterExprMap(int[] filterInfo, List<ExprNodeDesc> filterExprs) {
+    Map<Byte, ExprNodeDesc> filterExprMap = new HashMap<>();
+
+    int exprListOffset = 0;
+    for (int idx = 0; idx < filterInfo.length; idx = idx + 2) {
+      byte tag = (byte) filterInfo[idx];
+      int length = filterInfo[idx + 1];
+
+      int nextExprOffset = exprListOffset + length;
+      List<ExprNodeDesc> andArgs = filterExprs.subList(exprListOffset, nextExprOffset);
+      exprListOffset = nextExprOffset;
+
+      if (andArgs.size() == 1) {
+        filterExprMap.put(tag, andArgs.get(0));
+      } else if (andArgs.size() > 1) {
+        filterExprMap.put(tag, ExprNodeDescUtils.and(andArgs));
+      }
+    }
+
+    return filterExprMap;
+  }
+
+  /**
+   * Generate an ExprNodeDesc that expresses thw following code:
+   *   if (!condition) { return (short) (1 << tag) } else { return (short) 0; }.

Review Comment:
   I agree that it will be helpful if the actual expression is written. I modified the comment to show both actual expression and logically equivalent code.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org

[GitHub] [hive] sonarcloud[bot] commented on pull request #4115: HIVE-27138: Extend RSOp to compute filterTag if it has child MapJoinOp

Posted by "sonarcloud[bot] (via GitHub)" <gi...@apache.org>.

sonarcloud[bot] commented on PR #4115:
URL: https://github.com/apache/hive/pull/4115#issuecomment-1688119540

   Kudos, SonarCloud Quality Gate passed!&nbsp; &nbsp; [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive&pullRequest=4115)
   
   [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=BUG)  
   [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=VULNERABILITY)  
   [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4115&resolved=false&types=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4115&resolved=false&types=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4115&resolved=false&types=SECURITY_HOTSPOT)  
   [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=CODE_SMELL) [11 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=CODE_SMELL)
   
   [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4115&metric=coverage&view=list) No Coverage information  
   [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4115&metric=duplicated_lines_density&view=list) No Duplication information
   
   ![warning](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/message_warning-16px.png 'warning') The version of Java (11.0.8) you have used to run this analysis is deprecated and we will stop accepting it soon. Please update to at least Java 17.
   Read more [here](https://docs.sonarcloud.io/appendices/scanner-environment/)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org

[GitHub] [hive] ayushtkn commented on pull request #4115: HIVE-27138: Extend RSOp to compute filterTag if it has child MapJoinOp

Posted by "ayushtkn (via GitHub)" <gi...@apache.org>.

ayushtkn commented on PR #4115:
URL: https://github.com/apache/hive/pull/4115#issuecomment-1694407222

   It is on the optimiser side of code, @zabetak / @jfsii does this look interesting to any of you


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org

[GitHub] [hive] sonarcloud[bot] commented on pull request #4115: HIVE-27138: Extend RSOp to compute filterTag if it has child MapJoinOp

Posted by "sonarcloud[bot] (via GitHub)" <gi...@apache.org>.

sonarcloud[bot] commented on PR #4115:
URL: https://github.com/apache/hive/pull/4115#issuecomment-1720477157

   Kudos, SonarCloud Quality Gate passed!&nbsp; &nbsp; [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive&pullRequest=4115)
   
   [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=BUG)  
   [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=VULNERABILITY)  
   [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4115&resolved=false&types=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4115&resolved=false&types=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4115&resolved=false&types=SECURITY_HOTSPOT)  
   [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=CODE_SMELL) [10 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=CODE_SMELL)
   
   [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4115&metric=coverage&view=list) No Coverage information  
   [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4115&metric=duplicated_lines_density&view=list) No Duplication information
   
   ![warning](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/message_warning-16px.png 'warning') The version of Java (11.0.8) you have used to run this analysis is deprecated and we will stop accepting it soon. Please update to at least Java 17.
   Read more [here](https://docs.sonarcloud.io/appendices/scanner-environment/)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org

[GitHub] [hive] sonarcloud[bot] commented on pull request #4115: HIVE-27138: Extend RSOp to compute filterTag if it has child MapJoinOp

Posted by "sonarcloud[bot] (via GitHub)" <gi...@apache.org>.

sonarcloud[bot] commented on PR #4115:
URL: https://github.com/apache/hive/pull/4115#issuecomment-1517254225

   Kudos, SonarCloud Quality Gate passed!&nbsp; &nbsp; [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive&pullRequest=4115)
   
   [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=BUG)  
   [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=VULNERABILITY)  
   [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4115&resolved=false&types=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4115&resolved=false&types=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4115&resolved=false&types=SECURITY_HOTSPOT)  
   [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=CODE_SMELL) [11 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=CODE_SMELL)
   
   [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4115&metric=coverage&view=list) No Coverage information  
   [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4115&metric=duplicated_lines_density&view=list) No Duplication information
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org

[GitHub] [hive] github-actions[bot] commented on pull request #4115: HIVE-27138: Extend RSOp to compute filterTag if it has child MapJoinOp

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.

github-actions[bot] commented on PR #4115:
URL: https://github.com/apache/hive/pull/4115#issuecomment-1626385131

   This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the dev@hive.apache.org list if the patch is in need of reviews.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org

[GitHub] [hive] jfsii commented on pull request #4115: HIVE-27138: Extend RSOp to compute filterTag if it has child MapJoinOp

Posted by "jfsii (via GitHub)" <gi...@apache.org>.

jfsii commented on PR #4115:
URL: https://github.com/apache/hive/pull/4115#issuecomment-1701592682

   @ayushtkn @ngsg This isn't an area I am very familiar with but I'll see if I can be come a bit more familiar and help move this forward.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org

[GitHub] [hive] ngsg commented on pull request #4115: HIVE-27138: Extend RSOp to compute filterTag if it has child MapJoinOp

Posted by "ngsg (via GitHub)" <gi...@apache.org>.

ngsg commented on PR #4115:
URL: https://github.com/apache/hive/pull/4115#issuecomment-1707736760

   I renamed ExtendParentReduceSinkOfMapJoin to FiltertagAppenderProc and added some comments to explain the behaviour of it. @kasakrisz Could you please review the changes? Thank you.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org

[GitHub] [hive] kasakrisz commented on pull request #4115: HIVE-27138: Extend RSOp to compute filterTag if it has child MapJoinOp

Posted by "kasakrisz (via GitHub)" <gi...@apache.org>.

kasakrisz commented on PR #4115:
URL: https://github.com/apache/hive/pull/4115#issuecomment-1720759574

   @ngsg 
   Thanks for updating the patch. Tests passed and I would like to merge it.
   Who is the author? You opened the PR but the jira HIVE-27138 is assigned to @soenkeliebau.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org

[GitHub] [hive] kasakrisz commented on pull request #4115: HIVE-27138: Extend RSOp to compute filterTag if it has child MapJoinOp

Posted by "kasakrisz (via GitHub)" <gi...@apache.org>.

kasakrisz commented on PR #4115:
URL: https://github.com/apache/hive/pull/4115#issuecomment-1720962443

   No worries. Thanks for the clarification.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org

[GitHub] [hive] ngsg commented on a diff in pull request #4115: HIVE-27138: Extend RSOp to compute filterTag if it has child MapJoinOp

Posted by "ngsg (via GitHub)" <gi...@apache.org>.

ngsg commented on code in PR #4115:
URL: https://github.com/apache/hive/pull/4115#discussion_r1317144719


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/FiltertagAppenderProc.java:
##########
@@ -0,0 +1,231 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.optimizer;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Stack;
+
+import org.apache.hadoop.hive.ql.exec.ColumnInfo;
+import org.apache.hadoop.hive.ql.exec.MapJoinOperator;
+import org.apache.hadoop.hive.ql.exec.ReduceSinkOperator;
+import org.apache.hadoop.hive.ql.exec.Utilities;
+import org.apache.hadoop.hive.ql.lib.Node;
+import org.apache.hadoop.hive.ql.lib.NodeProcessorCtx;
+import org.apache.hadoop.hive.ql.lib.SemanticNodeProcessor;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDescUtils;
+import org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc;
+import org.apache.hadoop.hive.ql.plan.MapJoinDesc;
+import org.apache.hadoop.hive.ql.plan.PlanUtils;
+import org.apache.hadoop.hive.ql.plan.ReduceSinkDesc;
+import org.apache.hadoop.hive.ql.plan.TableDesc;
+import org.apache.hadoop.hive.ql.udf.UDFToShort;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPMultiply;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPNot;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPPlus;
+import org.apache.hadoop.hive.serde.serdeConstants;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
+
+/**
+ * Append a filterTag computation column to ReduceSinkOperators whose child is MapJoinOperator.
+ * The added column expresses JoinUtil#isFiltered in the form of ExprNode.
+ * The added column should be located at the end of row as CommonJoinOperator expects it.
+ *
+ * This processor only affects small tables of MapJoin that run on Tez engine.
+ * For big table, MapJoinOperator#process() calls CommonJoinOperator#getFilteredValue(), which adds filterTag.
+ * For MapReduce engine, HashTableSinkOperator adds filterTag to every row.
+ */
+public class FiltertagAppenderProc implements SemanticNodeProcessor {
+
+  private final TypeInfo shortType = TypeInfoFactory.getPrimitiveTypeInfo(serdeConstants.SMALLINT_TYPE_NAME);
+
+  @Override
+  public Object process(Node nd, Stack<Node> stack, NodeProcessorCtx procCtx, Object... nodeOutputs)
+      throws SemanticException {
+    MapJoinOperator mapJoinOp = (MapJoinOperator) nd;
+    MapJoinDesc mapJoinDesc = mapJoinOp.getConf();
+
+    if (mapJoinDesc.getFilterMap() == null) {
+      return null;
+    }
+
+    int[][] filterMap = mapJoinDesc.getFilterMap();
+
+    // 1. Extend ReduceSinkoperator if it's output is filtered by MapJoinOperator.
+    for (byte pos = 0; pos < filterMap.length; pos++) {
+      if (pos == mapJoinDesc.getPosBigTable() || filterMap[pos] == null) {
+        continue;
+      }
+
+      ExprNodeDesc filterTagExpr =
+          generateFilterTagExpression(filterMap[pos], mapJoinDesc.getFilters().get(pos));
+
+      // Note that the parent RS for the given pos is retrieved in different way in MapJoinProcessor.
+      // TODO: MapJoinProcessor.convertMapJoin() fixes the order of parent operators.
+      //  Does other callers also fix the order as well as MapJoinProcessor.convertMapJoin()?
+      ReduceSinkOperator parent = (ReduceSinkOperator) mapJoinOp.getParentOperators().get(pos);
+      ReduceSinkDesc pRsConf = parent.getConf();
+
+      // MapJoinProcessor.getMapJoinDesc() replaces filter expressions with backtracked one if
+      // adjustParentsChildren is true.
+      // As of now, ConvertJoinMapJoin.convertJoinDynamicPartitionedHashJoin() is the only functions that
+      // calls this method with adjustParentsChildren = false. Therefore, we backtrack filter expressions
+      // only if MapJoinDesc.isDynamicPartitionHashJoin is true, which is also the unique property of
+      // ConvertJoinMapJoin.convertJoinDynamicPartitionedHashJoin().
+      ExprNodeDesc mapSideFilterTagExpr;
+      if (mapJoinDesc.isDynamicPartitionHashJoin()) {
+        mapSideFilterTagExpr = ExprNodeDescUtils.backtrack(filterTagExpr, mapJoinOp, parent);
+      } else {
+        mapSideFilterTagExpr = filterTagExpr;
+      }
+      String filterColumnName = "_filterTag";
+
+      pRsConf.getValueCols().add(mapSideFilterTagExpr);
+      pRsConf.getOutputValueColumnNames().add(filterColumnName);
+      pRsConf.getColumnExprMap()
+          .put(Utilities.ReduceField.VALUE + "." + filterColumnName, mapSideFilterTagExpr);
+
+      ColumnInfo filterTagColumnInfo =
+          new ColumnInfo(Utilities.ReduceField.VALUE + "." + filterColumnName, shortType, "", false);
+      parent.getSchema().getSignature().add(filterTagColumnInfo);
+
+      TableDesc newTableDesc =
+          PlanUtils.getReduceValueTableDesc(
+              PlanUtils.getFieldSchemasFromColumnList(pRsConf.getValueCols(), "_col"));
+      pRsConf.setValueSerializeInfo(newTableDesc);
+    }
+
+    // 2. Update MapJoinOperator's valueFilteredTableDescs.
+    // Unlike HashTableSinkOperator used in MR engine, Tez engine directly passes rows from RS to MapJoin.
+    // Therefore, RS's writer and MapJoin's reader should have the same TableDesc. We create valueTableDesc
+    // here again because it can be different from RS's valueSerializeInfo due to ColumnPruner.
+    List<TableDesc> newMapJoinValueFilteredTableDescs = new ArrayList<>();
+    for (byte pos = 0; pos < mapJoinOp.getParentOperators().size(); pos++) {
+      TableDesc tableDesc;
+
+      if (pos == mapJoinDesc.getPosBigTable() || filterMap[pos] == null) {
+        // We did not change corresponding parent operator. Use the original tableDesc.
+        tableDesc = mapJoinDesc.getValueFilteredTblDescs().get(pos);
+      } else {
+        // Create a new TableDesc based on corresponding parent RSOperator.
+        ReduceSinkOperator parent = (ReduceSinkOperator) mapJoinOp.getParentOperators().get(pos);
+        ReduceSinkDesc pRsConf = parent.getConf();
+
+        tableDesc =
+            PlanUtils.getMapJoinValueTableDesc(
+                PlanUtils.getFieldSchemasFromColumnList(pRsConf.getValueCols(), "mapjoinvalue"));
+      }
+
+      newMapJoinValueFilteredTableDescs.add(tableDesc);
+    }
+    mapJoinDesc.setValueFilteredTblDescs(newMapJoinValueFilteredTableDescs);
+
+    return null;
+  }
+
+  /**
+   * Generate an ExprNodeDesc that expresses the following method:
+   * JoinUtil#isFiltered(Object, List<ExprNodeEvaluator>, List<ObjectInspector>, int[]).
+   */
+  private ExprNodeDesc generateFilterTagExpression(int[] filterMap, List<ExprNodeDesc> filterExprs) {
+    ExprNodeDesc filterTagExpr = new ExprNodeConstantDesc(shortType, (short) 0);
+    Map<Byte, ExprNodeDesc> filterExprMap = getFilterExprMap(filterMap, filterExprs);
+
+    for (Map.Entry<Byte, ExprNodeDesc> entry: filterExprMap.entrySet()) {
+      ExprNodeDesc filterTagMaskExpr = generateFilterTagMask(entry.getKey(), entry.getValue());
+
+      if (filterTagExpr instanceof ExprNodeConstantDesc) {
+        filterTagExpr = filterTagMaskExpr;
+      } else {
+        List<ExprNodeDesc> plusArgs = new ArrayList<>(2);
+        plusArgs.add(filterTagMaskExpr);
+        plusArgs.add(filterTagExpr);
+        filterTagExpr = new ExprNodeGenericFuncDesc(shortType, new GenericUDFOPPlus(), plusArgs);
+      }
+    }
+
+    return filterTagExpr;
+  }
+
+  /**
+   * Group filterExprs by tag and merge each of them into a single boolean ExprNodeDesc using AND operator.
+   * filterInfo is repetition of tag and the length of corresponding filter expressions.
+   * For example, filterInfo = {0, 2, 1, 3} means that the first 2 elements in filterExprs belong to tag 0,
+   * and the remaining 3 elements belong to tag 1.
+   */
+  private Map<Byte, ExprNodeDesc> getFilterExprMap(int[] filterInfo, List<ExprNodeDesc> filterExprs) {
+    Map<Byte, ExprNodeDesc> filterExprMap = new HashMap<>();
+
+    int exprListOffset = 0;
+    for (int idx = 0; idx < filterInfo.length; idx = idx + 2) {
+      byte tag = (byte) filterInfo[idx];
+      int length = filterInfo[idx + 1];
+
+      int nextExprOffset = exprListOffset + length;
+      List<ExprNodeDesc> andArgs = filterExprs.subList(exprListOffset, nextExprOffset);
+      exprListOffset = nextExprOffset;
+
+      if (andArgs.size() == 1) {
+        filterExprMap.put(tag, andArgs.get(0));
+      } else if (andArgs.size() > 1) {
+        filterExprMap.put(tag, ExprNodeDescUtils.and(andArgs));
+      }
+    }
+
+    return filterExprMap;
+  }
+
+  /**
+   * Generate an ExprNodeDesc that expresses thw following code:

Review Comment:
   Thanks for catching that, I fixed it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org

[GitHub] [hive] sonarcloud[bot] commented on pull request #4115: HIVE-27138: Extend RSOp to compute filterTag if it has child MapJoinOp

Posted by "sonarcloud[bot] (via GitHub)" <gi...@apache.org>.

sonarcloud[bot] commented on PR #4115:
URL: https://github.com/apache/hive/pull/4115#issuecomment-1708726257

   Kudos, SonarCloud Quality Gate passed!&nbsp; &nbsp; [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive&pullRequest=4115)
   
   [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=BUG)  
   [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=VULNERABILITY)  
   [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4115&resolved=false&types=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4115&resolved=false&types=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4115&resolved=false&types=SECURITY_HOTSPOT)  
   [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=CODE_SMELL) [10 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4115&resolved=false&types=CODE_SMELL)
   
   [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4115&metric=coverage&view=list) No Coverage information  
   [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4115&metric=duplicated_lines_density&view=list) No Duplication information
   
   ![warning](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/message_warning-16px.png 'warning') The version of Java (11.0.8) you have used to run this analysis is deprecated and we will stop accepting it soon. Please update to at least Java 17.
   Read more [here](https://docs.sonarcloud.io/appendices/scanner-environment/)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org

[GitHub] [hive] nrg4878 commented on pull request #4115: HIVE-27138: Extend RSOp to compute filterTag if it has child MapJoinOp

Posted by "nrg4878 (via GitHub)" <gi...@apache.org>.

nrg4878 commented on PR #4115:
URL: https://github.com/apache/hive/pull/4115#issuecomment-1538394166

   @zabetak @abstractdog Can you please review these if you have a chance? Thank you


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org