You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by "ngsg (via GitHub)" <gi...@apache.org> on 2023/05/19 10:52:39 UTC

[GitHub] [hive] ngsg opened a new pull request, #4043: HIVE-27006: Fix ParallelEdgeFixer

ngsg opened a new pull request, #4043:
URL: https://github.com/apache/hive/pull/4043

   
   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://cwiki.apache.org/confluence/display/Hive/HowToContribute
     2. Ensure that you have created an issue on the Hive project JIRA: https://issues.apache.org/jira/projects/HIVE/summary
     3. Ensure you have added or run the appropriate tests for your PR: 
     4. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]HIVE-XXXXX:  Your PR title ...'.
     5. Be sure to keep the PR description updated to reflect all changes.
     6. Please write your PR title to summarize what this PR proposes.
     7. If possible, provide a concise example to reproduce the issue for a faster review.
   
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   ParallelEdgeFixer refers to RowSchema when inverting columns and updates RuntimeValueInfo as well as SemiJoinBranchInfo.
   
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   Current ParallelEdgeFixer does not update RuntimeValueInfo while SemiJoinBranchInfo is updated.
   Since TezCompiler refers to RuntimeValueInfo when adding SemiJoin edges into a Tez DAG, the inconsistency between RuntimeValueInfo and SemiJoinBranchInfo leads to the absence of SemiJoin edge in Tez runtime.
   
   Another problem of ParallelEdgeFixer is incorrect result of colMappingInverseKeys().
   In current implementation, colMappingInverseKeys() depends on Operator.getColumnExprMap(), but I found that this method sometimes returns an empty map although the Operator contains some columns. (Also the comment of this method says that it returns only key columns for RS and GBY operators.)
   When this happens, ParallelEdgeFixer inserts a SEL operator without any column, and its child RS operator eventually fails due to Runtime error like below message.
   ```
   Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: cannot find field _col0 from []
           at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:384)
           at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
           at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:94)
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description, screenshot and/or a reproducable example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Hive versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   No
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   
   I tested the patch manually on cluster using the query described in JIRA and TPC-DS queries.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "ngsg (via GitHub)" <gi...@apache.org>.
ngsg commented on code in PR #4043:
URL: https://github.com/apache/hive/pull/4043#discussion_r1401482791


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java:
##########
@@ -796,31 +787,29 @@ private void createFinalRsForSemiJoinOp(
           ExprNodeDesc key, String keyBaseAlias, ExprNodeDesc colExpr,
           boolean isHint) throws SemanticException {
     ArrayList<String> gbOutputNames = new ArrayList<>();
-    // One each for min, max and bloom filter
-    gbOutputNames.add(SemanticAnalyzer.getColumnInternalName(0));
-    gbOutputNames.add(SemanticAnalyzer.getColumnInternalName(1));
-    gbOutputNames.add(SemanticAnalyzer.getColumnInternalName(2));
-
-    int colPos = 0;
     ArrayList<ExprNodeDesc> rsValueCols = new ArrayList<ExprNodeDesc>();
-    for (int i = 0; i < gbOutputNames.size() - 1; i++) {
-      ExprNodeColumnDesc expr = new ExprNodeColumnDesc(key.getTypeInfo(),
-              gbOutputNames.get(colPos++), "", false);
-      rsValueCols.add(expr);
+    Map<String, ExprNodeDesc> columnExprMap = new HashMap<String, ExprNodeDesc>();
+    ArrayList<ColumnInfo> rsColInfos = new ArrayList<>();
+    List<ColumnInfo> gbySchema = gb.getSchema().getSignature();
+    for (int colPos = 0; colPos < gbySchema.size(); colPos++) {

Review Comment:
   changed to for-each loop



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "deniskuzZ (via GitHub)" <gi...@apache.org>.
deniskuzZ merged PR #4043:
URL: https://github.com/apache/hive/pull/4043


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "deniskuzZ (via GitHub)" <gi...@apache.org>.
deniskuzZ commented on code in PR #4043:
URL: https://github.com/apache/hive/pull/4043#discussion_r1388381233


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ParallelEdgeFixer.java:
##########
@@ -324,26 +330,25 @@ private static String extractColumnName(ExprNodeDesc expr) throws SemanticExcept
   public static Optional<Set<String>> colMappingInverseKeys(ReduceSinkOperator rs) {
     Map<String, String> ret = new HashMap<String, String>();
     Map<String, ExprNodeDesc> exprMap = rs.getColumnExprMap();
-    Set<String> neededColumns = new HashSet<String>();
+
+    if (!rs.getSchema().getColumnNames().stream().allMatch(exprMap::containsKey)) {
+      // Cannot invert RS because exprMap does not contain all of the RS's input columns.
+      return Optional.empty();
+    }
+
     try {
       for (Entry<String, ExprNodeDesc> e : exprMap.entrySet()) {
         String columnName = extractColumnName(e.getValue());
         if (rs.getSchema().getColumnInfo(e.getKey()) == null) {
-          // ignore incorrectly mapped columns (if there's any) - but require its input to be present
-          neededColumns.add(columnName);
-        } else {
-          ret.put(columnName, e.getKey());
+          // Cannot invert RS because the column needed by expression is not mapped correctly.
+          return Optional.empty();

Review Comment:
   it looks like `exprMap` might have duplicate columns {columnName} with different mappings {e.getKey()}, so I think we need to restore to prev version
   how about this:
   ````
       Set<String> neededColumns = new HashSet<String>();
       try {
         for (Entry<String, ExprNodeDesc> e : exprMap.entrySet()) {
           String columnName = extractColumnName(e.getValue());
           if (rs.getSchema().getColumnInfo(e.getKey()) == null) {
             // ignore incorrectly mapped columns (if there's any) - but require its input to be present
             neededColumns.add(columnName);
           } else {
             ret.put(columnName, e.getKey());
           }
         }
         neededColumns.removeAll(ret.keySet());
         if (!neededColumns.isEmpty() || exprMap.isEmpty()) {
           // There is no way to compute all parts of neededColumns
           return Optional.empty();
         }
     }
   
   ````



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ParallelEdgeFixer.java:
##########
@@ -324,26 +330,25 @@ private static String extractColumnName(ExprNodeDesc expr) throws SemanticExcept
   public static Optional<Set<String>> colMappingInverseKeys(ReduceSinkOperator rs) {
     Map<String, String> ret = new HashMap<String, String>();
     Map<String, ExprNodeDesc> exprMap = rs.getColumnExprMap();
-    Set<String> neededColumns = new HashSet<String>();
+
+    if (!rs.getSchema().getColumnNames().stream().allMatch(exprMap::containsKey)) {
+      // Cannot invert RS because exprMap does not contain all of the RS's input columns.
+      return Optional.empty();
+    }
+
     try {
       for (Entry<String, ExprNodeDesc> e : exprMap.entrySet()) {
         String columnName = extractColumnName(e.getValue());
         if (rs.getSchema().getColumnInfo(e.getKey()) == null) {
-          // ignore incorrectly mapped columns (if there's any) - but require its input to be present
-          neededColumns.add(columnName);
-        } else {
-          ret.put(columnName, e.getKey());
+          // Cannot invert RS because the column needed by expression is not mapped correctly.
+          return Optional.empty();

Review Comment:
   it looks like `exprMap` might have duplicate columns {columnName} with different mappings {e.getKey()}, so I think we need to restore to prev version
   how about this:
   ````
       Set<String> neededColumns = new HashSet<String>();
       try {
         for (Entry<String, ExprNodeDesc> e : exprMap.entrySet()) {
           String columnName = extractColumnName(e.getValue());
           if (rs.getSchema().getColumnInfo(e.getKey()) == null) {
             // ignore incorrectly mapped columns (if there's any) - but require its input to be present
             neededColumns.add(columnName);
           } else {
             ret.put(columnName, e.getKey());
           }
         }
         neededColumns.removeAll(ret.keySet());
         if (!neededColumns.isEmpty() || exprMap.isEmpty()) {
           // There is no way to compute all parts of neededColumns
           return Optional.empty();
         }
       }
   
   ````



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] sonarcloud[bot] commented on pull request #4043: HIVE-27006: Fix ParallelEdgeFixer

Posted by "sonarcloud[bot] (via GitHub)" <gi...@apache.org>.
sonarcloud[bot] commented on PR #4043:
URL: https://github.com/apache/hive/pull/4043#issuecomment-1432882343

   Kudos, SonarCloud Quality Gate passed!&nbsp; &nbsp; [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive&pullRequest=4043)
   
   [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=BUG)  
   [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=VULNERABILITY)  
   [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4043&resolved=false&types=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4043&resolved=false&types=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4043&resolved=false&types=SECURITY_HOTSPOT)  
   [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=CODE_SMELL) [1 Code Smell](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=CODE_SMELL)
   
   [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4043&metric=coverage&view=list) No Coverage information  
   [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4043&metric=duplicated_lines_density&view=list) No Duplication information
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "sonarcloud[bot] (via GitHub)" <gi...@apache.org>.
sonarcloud[bot] commented on PR #4043:
URL: https://github.com/apache/hive/pull/4043#issuecomment-1803764186

   Kudos, SonarCloud Quality Gate passed!&nbsp; &nbsp; [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive&pullRequest=4043)
   
   [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=BUG)  
   [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=VULNERABILITY)  
   [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4043&resolved=false&types=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4043&resolved=false&types=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4043&resolved=false&types=SECURITY_HOTSPOT)  
   [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=CODE_SMELL) [7 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=CODE_SMELL)
   
   [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4043&metric=coverage&view=list) No Coverage information  
   [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4043&metric=duplicated_lines_density&view=list) No Duplication information
   
   ![warning](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/message_warning-16px.png 'warning') The version of Java (11.0.8) you have used to run this analysis is deprecated and we will stop accepting it soon. Please update to at least Java 17.
   Read more [here](https://docs.sonarcloud.io/appendices/scanner-environment/)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "deniskuzZ (via GitHub)" <gi...@apache.org>.
deniskuzZ commented on code in PR #4043:
URL: https://github.com/apache/hive/pull/4043#discussion_r1387994260


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java:
##########
@@ -678,38 +678,34 @@ private boolean generateSemiJoinOperatorPlan(DynamicListContext ctx, ParseContex
     ArrayList<ColumnInfo> groupbyColInfos = new ArrayList<ColumnInfo>();
     groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(0), key.getTypeInfo(), "", false));
     groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(1), key.getTypeInfo(), "", false));
-    groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(2), key.getTypeInfo(), "", false));
+    groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(2), TypeInfoFactory.binaryTypeInfo, "", false));
 
     GroupByOperator groupByOp = (GroupByOperator)OperatorFactory.getAndMakeChild(
             groupBy, new RowSchema(groupbyColInfos), selectOp);
 
     groupByOp.setColumnExprMap(new HashMap<String, ExprNodeDesc>());
 
     // Get the column names of the aggregations for reduce sink
-    int colPos = 0;
     ArrayList<ExprNodeDesc> rsValueCols = new ArrayList<ExprNodeDesc>();
     Map<String, ExprNodeDesc> columnExprMap = new HashMap<String, ExprNodeDesc>();
-    for (int i = 0; i < aggs.size() - 1; i++) {
-      ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(key.getTypeInfo(),
-              gbOutputNames.get(colPos), "", false);
+    ArrayList<ColumnInfo> rsColInfos = new ArrayList<>();
+    for (int colPos = 0; colPos < aggs.size(); colPos++) {
+      TypeInfo typInfo = groupbyColInfos.get(colPos).getType();
+      ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(typInfo, gbOutputNames.get(colPos), "", false);
       rsValueCols.add(colExpr);
-      columnExprMap.put(gbOutputNames.get(colPos), colExpr);
-      colPos++;
-    }
+      columnExprMap.put(Utilities.ReduceField.VALUE + "." + gbOutputNames.get(colPos), colExpr);
 
-    // Bloom Filter uses binary
-    ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(TypeInfoFactory.binaryTypeInfo,
-        gbOutputNames.get(colPos), "", false);
-    rsValueCols.add(colExpr);
-    columnExprMap.put(gbOutputNames.get(colPos), colExpr);
-    colPos++;
+      ColumnInfo colInfo =

Review Comment:
   why do we need this ref (Utilities.ReduceField.VALUE), can we use groupByOp.getSchema()?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "ngsg (via GitHub)" <gi...@apache.org>.
ngsg commented on PR #4043:
URL: https://github.com/apache/hive/pull/4043#issuecomment-1803898226

   @ayushtkn, sorry for the late response. I think I missed the Github notification about your question.
   This patch changes the schema of GroupBy operator, which makes changes in qfiles' stats.
   GBY operator's 3rd column is bloom filter, so its type should be binary. But current DPPOptimization sets incorrect type instead of binary type. This patch fixes this issue, and the change of column type makes a difference to query statistics.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] github-actions[bot] closed pull request #4043: HIVE-27006: Fix ParallelEdgeFixer

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed pull request #4043: HIVE-27006: Fix ParallelEdgeFixer
URL: https://github.com/apache/hive/pull/4043


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] sonarcloud[bot] commented on pull request #4043: HIVE-27006: Fix ParallelEdgeFixer

Posted by "sonarcloud[bot] (via GitHub)" <gi...@apache.org>.
sonarcloud[bot] commented on PR #4043:
URL: https://github.com/apache/hive/pull/4043#issuecomment-1422479772

   Kudos, SonarCloud Quality Gate passed!&nbsp; &nbsp; [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive&pullRequest=4043)
   
   [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=BUG)  
   [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=VULNERABILITY)  
   [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4043&resolved=false&types=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4043&resolved=false&types=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4043&resolved=false&types=SECURITY_HOTSPOT)  
   [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=CODE_SMELL) [10 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=CODE_SMELL)
   
   [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4043&metric=coverage&view=list) No Coverage information  
   [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4043&metric=duplicated_lines_density&view=list) No Duplication information
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] github-actions[bot] commented on pull request #4043: HIVE-27006: Fix ParallelEdgeFixer

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #4043:
URL: https://github.com/apache/hive/pull/4043#issuecomment-1513946248

   This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the dev@hive.apache.org list if the patch is in need of reviews.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] github-actions[bot] commented on pull request #4043: HIVE-27006: Fix ParallelEdgeFixer

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #4043:
URL: https://github.com/apache/hive/pull/4043#issuecomment-1647013310

   This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the dev@hive.apache.org list if the patch is in need of reviews.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "ngsg (via GitHub)" <gi...@apache.org>.
ngsg commented on code in PR #4043:
URL: https://github.com/apache/hive/pull/4043#discussion_r1387525437


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ParallelEdgeFixer.java:
##########
@@ -325,6 +331,11 @@ public static Optional<Set<String>> colMappingInverseKeys(ReduceSinkOperator rs)
     Map<String, String> ret = new HashMap<String, String>();
     Map<String, ExprNodeDesc> exprMap = rs.getColumnExprMap();
     Set<String> neededColumns = new HashSet<String>();
+
+    if (!rs.getSchema().getColumnNames().stream().allMatch(exprMap::containsKey)) {

Review Comment:
   Sure, I removed neededColumns and added comments to explain the reason for inversion failure.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "deniskuzZ (via GitHub)" <gi...@apache.org>.
deniskuzZ commented on code in PR #4043:
URL: https://github.com/apache/hive/pull/4043#discussion_r1389415882


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java:
##########
@@ -678,38 +678,34 @@ private boolean generateSemiJoinOperatorPlan(DynamicListContext ctx, ParseContex
     ArrayList<ColumnInfo> groupbyColInfos = new ArrayList<ColumnInfo>();
     groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(0), key.getTypeInfo(), "", false));
     groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(1), key.getTypeInfo(), "", false));
-    groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(2), key.getTypeInfo(), "", false));
+    groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(2), TypeInfoFactory.binaryTypeInfo, "", false));
 
     GroupByOperator groupByOp = (GroupByOperator)OperatorFactory.getAndMakeChild(
             groupBy, new RowSchema(groupbyColInfos), selectOp);
 
     groupByOp.setColumnExprMap(new HashMap<String, ExprNodeDesc>());
 
     // Get the column names of the aggregations for reduce sink
-    int colPos = 0;
     ArrayList<ExprNodeDesc> rsValueCols = new ArrayList<ExprNodeDesc>();
     Map<String, ExprNodeDesc> columnExprMap = new HashMap<String, ExprNodeDesc>();
-    for (int i = 0; i < aggs.size() - 1; i++) {
-      ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(key.getTypeInfo(),
-              gbOutputNames.get(colPos), "", false);
+    ArrayList<ColumnInfo> rsColInfos = new ArrayList<>();
+    for (int colPos = 0; colPos < aggs.size(); colPos++) {
+      TypeInfo typInfo = groupbyColInfos.get(colPos).getType();
+      ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(typInfo, gbOutputNames.get(colPos), "", false);
       rsValueCols.add(colExpr);
-      columnExprMap.put(gbOutputNames.get(colPos), colExpr);
-      colPos++;
-    }
+      columnExprMap.put(Utilities.ReduceField.VALUE + "." + gbOutputNames.get(colPos), colExpr);
 
-    // Bloom Filter uses binary
-    ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(TypeInfoFactory.binaryTypeInfo,
-        gbOutputNames.get(colPos), "", false);
-    rsValueCols.add(colExpr);
-    columnExprMap.put(gbOutputNames.get(colPos), colExpr);
-    colPos++;
+      ColumnInfo colInfo =

Review Comment:
   thank you @kgyrtkirk for chiming in! 
   I don't have much expertise in that area. Small question: 
   When in FinalRsForSemiJoinOp we don't prefix columns with  ReduceField.VALUE (columnExprMap & schema) then ParallelEdgeFixer kicks in and introduces a concentrator RS that expects [key, value] inputs. 
   ````
   Caused by: java.lang.RuntimeException: cannot find field _col0 from [0:key, 1:value]
   	at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:550)
   	at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:153)
   	at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:56)
   	at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:1073)
   	at org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:1099)
   	at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:74)
   	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:360)
   	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:191)
   	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:292)
   ````
   Should we prefix the columns in FinalRsForSemiJoinOp or make some fixes in fixParallelEdge SEL inputs?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "deniskuzZ (via GitHub)" <gi...@apache.org>.
deniskuzZ commented on code in PR #4043:
URL: https://github.com/apache/hive/pull/4043#discussion_r1388046180


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java:
##########
@@ -678,38 +678,34 @@ private boolean generateSemiJoinOperatorPlan(DynamicListContext ctx, ParseContex
     ArrayList<ColumnInfo> groupbyColInfos = new ArrayList<ColumnInfo>();
     groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(0), key.getTypeInfo(), "", false));
     groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(1), key.getTypeInfo(), "", false));
-    groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(2), key.getTypeInfo(), "", false));
+    groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(2), TypeInfoFactory.binaryTypeInfo, "", false));
 
     GroupByOperator groupByOp = (GroupByOperator)OperatorFactory.getAndMakeChild(
             groupBy, new RowSchema(groupbyColInfos), selectOp);
 
     groupByOp.setColumnExprMap(new HashMap<String, ExprNodeDesc>());
 
     // Get the column names of the aggregations for reduce sink
-    int colPos = 0;
     ArrayList<ExprNodeDesc> rsValueCols = new ArrayList<ExprNodeDesc>();
     Map<String, ExprNodeDesc> columnExprMap = new HashMap<String, ExprNodeDesc>();
-    for (int i = 0; i < aggs.size() - 1; i++) {
-      ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(key.getTypeInfo(),
-              gbOutputNames.get(colPos), "", false);
+    ArrayList<ColumnInfo> rsColInfos = new ArrayList<>();
+    for (int colPos = 0; colPos < aggs.size(); colPos++) {
+      TypeInfo typInfo = groupbyColInfos.get(colPos).getType();
+      ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(typInfo, gbOutputNames.get(colPos), "", false);
       rsValueCols.add(colExpr);
-      columnExprMap.put(gbOutputNames.get(colPos), colExpr);
-      colPos++;
-    }
+      columnExprMap.put(Utilities.ReduceField.VALUE + "." + gbOutputNames.get(colPos), colExpr);
 
-    // Bloom Filter uses binary
-    ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(TypeInfoFactory.binaryTypeInfo,
-        gbOutputNames.get(colPos), "", false);
-    rsValueCols.add(colExpr);
-    columnExprMap.put(gbOutputNames.get(colPos), colExpr);
-    colPos++;
+      ColumnInfo colInfo =

Review Comment:
   sorry for the dumb question, but why do we need `ReduceField.VALUE` prefix in  RS operator? if that is required, how did it even work? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "deniskuzZ (via GitHub)" <gi...@apache.org>.
deniskuzZ commented on code in PR #4043:
URL: https://github.com/apache/hive/pull/4043#discussion_r1388381233


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ParallelEdgeFixer.java:
##########
@@ -324,26 +330,25 @@ private static String extractColumnName(ExprNodeDesc expr) throws SemanticExcept
   public static Optional<Set<String>> colMappingInverseKeys(ReduceSinkOperator rs) {
     Map<String, String> ret = new HashMap<String, String>();
     Map<String, ExprNodeDesc> exprMap = rs.getColumnExprMap();
-    Set<String> neededColumns = new HashSet<String>();
+
+    if (!rs.getSchema().getColumnNames().stream().allMatch(exprMap::containsKey)) {
+      // Cannot invert RS because exprMap does not contain all of the RS's input columns.
+      return Optional.empty();
+    }
+
     try {
       for (Entry<String, ExprNodeDesc> e : exprMap.entrySet()) {
         String columnName = extractColumnName(e.getValue());
         if (rs.getSchema().getColumnInfo(e.getKey()) == null) {
-          // ignore incorrectly mapped columns (if there's any) - but require its input to be present
-          neededColumns.add(columnName);
-        } else {
-          ret.put(columnName, e.getKey());
+          // Cannot invert RS because the column needed by expression is not mapped correctly.
+          return Optional.empty();

Review Comment:
   it looks like `exprMap` might have duplicate columns {columnName} with different mappings {e.getKey()}, so I think we need to restore to prev version



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "ngsg (via GitHub)" <gi...@apache.org>.
ngsg commented on code in PR #4043:
URL: https://github.com/apache/hive/pull/4043#discussion_r1395752162


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java:
##########
@@ -678,38 +678,34 @@ private boolean generateSemiJoinOperatorPlan(DynamicListContext ctx, ParseContex
     ArrayList<ColumnInfo> groupbyColInfos = new ArrayList<ColumnInfo>();
     groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(0), key.getTypeInfo(), "", false));
     groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(1), key.getTypeInfo(), "", false));
-    groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(2), key.getTypeInfo(), "", false));
+    groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(2), TypeInfoFactory.binaryTypeInfo, "", false));
 
     GroupByOperator groupByOp = (GroupByOperator)OperatorFactory.getAndMakeChild(
             groupBy, new RowSchema(groupbyColInfos), selectOp);
 
     groupByOp.setColumnExprMap(new HashMap<String, ExprNodeDesc>());
 
     // Get the column names of the aggregations for reduce sink
-    int colPos = 0;
     ArrayList<ExprNodeDesc> rsValueCols = new ArrayList<ExprNodeDesc>();
     Map<String, ExprNodeDesc> columnExprMap = new HashMap<String, ExprNodeDesc>();
-    for (int i = 0; i < aggs.size() - 1; i++) {
-      ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(key.getTypeInfo(),
-              gbOutputNames.get(colPos), "", false);
+    ArrayList<ColumnInfo> rsColInfos = new ArrayList<>();
+    for (int colPos = 0; colPos < aggs.size(); colPos++) {
+      TypeInfo typInfo = groupbyColInfos.get(colPos).getType();
+      ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(typInfo, gbOutputNames.get(colPos), "", false);
       rsValueCols.add(colExpr);
-      columnExprMap.put(gbOutputNames.get(colPos), colExpr);
-      colPos++;
-    }
+      columnExprMap.put(Utilities.ReduceField.VALUE + "." + gbOutputNames.get(colPos), colExpr);
 
-    // Bloom Filter uses binary
-    ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(TypeInfoFactory.binaryTypeInfo,
-        gbOutputNames.get(colPos), "", false);
-    rsValueCols.add(colExpr);
-    columnExprMap.put(gbOutputNames.get(colPos), colExpr);
-    colPos++;
+      ColumnInfo colInfo =

Review Comment:
   @deniskuzZ, I have checked your comment and my work, and I summarized my conclusion as follows:
   
   1. about `ReduceField.VALUE`
   
   I think we should prepend the name of RS operator's columns in colExprMap and schema because RS's child operators always access to input columns(output of RS) as `KEY.col` and `VALUE.col`.
   
   RS operator's output rows are transported to next operator via shuffle, not by directly calling `Operator.forward()`. `ReduceRecordSource` reads shuffled KV pairs and calls the child operator's `Operator.process()` on behalf of `Operator.forward()`. If vectorization is disabled, it passes a `List<Object>` of length 2 as a row, and the corresponding ObjectInspector consists of `ReduceField.KEY` and `ReduceField.VALUE`. [1] If vectorization is enabled, it passes a single struct object as a row, and the corresponding ObjectInspector consists of `ReduceField.KEY + "." + fieldName` and `ReduceField.VALUE + "." + fieldName`. [2] In both cases, the name of columns come from RS's child operator start with either `ReduceField.KEY` or `ReduceField.VALUE`.
   
   `colExprMap` maps an output column name to its expression [3], so the key of `colExprMap` of RS should be prefixed by `ReduceField.KEY` or `ReduceField.VALUE`. I could not find any information about `schema`, but it seems that `schema` also represents output column names. [4] So I think both `colExprMap`'s key and `schema` of RS should be prefixed by `ReduceField.KEY` or `ReduceField.VALUE`.
   
   2. about your investigation
   
   DPPOptimization creates 4 operators, GBY->RS->GBY->RS, and `sharedwork_semi_2.q` tests PEF by inverting one of the final RS operators that DPPOptimization created. PEF refers to RS's `colExprMap` when creating a SEL operator that performs inversion. That's why we fail this test due to `java.lang.RuntimeException: cannot find field _col0 from [0:key, 1:value]` if we do not prepend the key of final RS's `colExprMap` with `ReduceField.VALUE`. Unlike final RS operators, intermediate RS operators are not inverted during the test,  so prepending intermediate RS operator's column name does not affect to the result of the test.
   
   3. about `ParallelEdgeFixer.colMappingInverseKeys()`
   
   According to the comment of Operator.getColumnExprMap(), it returns only key columns for RS operators. [3] I'm not sure whether it is still valid or not, but I want to keep the added code as a kind of defensive programming. 
   
   [1] https://github.com/apache/hive/blob/8a4f5ce7275842ff4f1cc917c7a2a48dde71bf4c/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java#L229-L231
   [2] https://github.com/apache/hive/blob/8a4f5ce7275842ff4f1cc917c7a2a48dde71bf4c/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L4333-L4341
   [3]
   https://github.com/apache/hive/blob/8a4f5ce7275842ff4f1cc917c7a2a48dde71bf4c/ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java#L991-L997
   [4]
   https://github.com/apache/hive/blob/8a4f5ce7275842ff4f1cc917c7a2a48dde71bf4c/ql/src/java/org/apache/hadoop/hive/ql/optimizer/CountDistinctRewriteProc.java#L443-L447
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "kgyrtkirk (via GitHub)" <gi...@apache.org>.
kgyrtkirk commented on code in PR #4043:
URL: https://github.com/apache/hive/pull/4043#discussion_r1389331356


##########
ql/src/test/queries/clientpositive/sharedwork_semi_2.q:
##########
@@ -0,0 +1,104 @@
+set hive.auto.convert.anti.join=false;
+set hive.auto.convert.join.noconditionaltask.size=1145044992;
+set hive.auto.convert.join.noconditionaltask=true;
+set hive.auto.convert.join=true;
+set hive.auto.convert.sortmerge.join.to.mapjoin=true;
+set hive.auto.convert.sortmerge.join=true;
+set hive.cbo.enable=true;
+set hive.convert.join.bucket.mapjoin.tez=true;
+set hive.enforce.sortmergebucketmapjoin=true;
+set hive.exec.dynamic.partition.mode=nonstrict;
+set hive.exec.dynamic.partition=true;
+set hive.exec.max.dynamic.partitions.pernode=4000;
+set hive.exec.max.dynamic.partitions=10000;
+set hive.exec.parallel.thread.number=32;
+set hive.exec.parallel=false;

Review Comment:
   why do you need to set all these irrelevant options?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "deniskuzZ (via GitHub)" <gi...@apache.org>.
deniskuzZ commented on code in PR #4043:
URL: https://github.com/apache/hive/pull/4043#discussion_r1389297013


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java:
##########
@@ -678,38 +678,34 @@ private boolean generateSemiJoinOperatorPlan(DynamicListContext ctx, ParseContex
     ArrayList<ColumnInfo> groupbyColInfos = new ArrayList<ColumnInfo>();
     groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(0), key.getTypeInfo(), "", false));
     groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(1), key.getTypeInfo(), "", false));
-    groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(2), key.getTypeInfo(), "", false));
+    groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(2), TypeInfoFactory.binaryTypeInfo, "", false));
 
     GroupByOperator groupByOp = (GroupByOperator)OperatorFactory.getAndMakeChild(
             groupBy, new RowSchema(groupbyColInfos), selectOp);
 
     groupByOp.setColumnExprMap(new HashMap<String, ExprNodeDesc>());
 
     // Get the column names of the aggregations for reduce sink
-    int colPos = 0;
     ArrayList<ExprNodeDesc> rsValueCols = new ArrayList<ExprNodeDesc>();
     Map<String, ExprNodeDesc> columnExprMap = new HashMap<String, ExprNodeDesc>();
-    for (int i = 0; i < aggs.size() - 1; i++) {
-      ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(key.getTypeInfo(),
-              gbOutputNames.get(colPos), "", false);
+    ArrayList<ColumnInfo> rsColInfos = new ArrayList<>();
+    for (int colPos = 0; colPos < aggs.size(); colPos++) {
+      TypeInfo typInfo = groupbyColInfos.get(colPos).getType();
+      ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(typInfo, gbOutputNames.get(colPos), "", false);
       rsValueCols.add(colExpr);
-      columnExprMap.put(gbOutputNames.get(colPos), colExpr);
-      colPos++;
-    }
+      columnExprMap.put(Utilities.ReduceField.VALUE + "." + gbOutputNames.get(colPos), colExpr);
 
-    // Bloom Filter uses binary
-    ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(TypeInfoFactory.binaryTypeInfo,
-        gbOutputNames.get(colPos), "", false);
-    rsValueCols.add(colExpr);
-    columnExprMap.put(gbOutputNames.get(colPos), colExpr);
-    colPos++;
+      ColumnInfo colInfo =

Review Comment:
   @ngsg, patch LGTM, however, could we please keep just the min required changes and do a minor refactor as in the above snippet?  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "ngsg (via GitHub)" <gi...@apache.org>.
ngsg commented on code in PR #4043:
URL: https://github.com/apache/hive/pull/4043#discussion_r1398941508


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java:
##########
@@ -678,38 +678,34 @@ private boolean generateSemiJoinOperatorPlan(DynamicListContext ctx, ParseContex
     ArrayList<ColumnInfo> groupbyColInfos = new ArrayList<ColumnInfo>();
     groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(0), key.getTypeInfo(), "", false));
     groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(1), key.getTypeInfo(), "", false));
-    groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(2), key.getTypeInfo(), "", false));
+    groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(2), TypeInfoFactory.binaryTypeInfo, "", false));
 
     GroupByOperator groupByOp = (GroupByOperator)OperatorFactory.getAndMakeChild(
             groupBy, new RowSchema(groupbyColInfos), selectOp);
 
     groupByOp.setColumnExprMap(new HashMap<String, ExprNodeDesc>());
 
     // Get the column names of the aggregations for reduce sink
-    int colPos = 0;
     ArrayList<ExprNodeDesc> rsValueCols = new ArrayList<ExprNodeDesc>();
     Map<String, ExprNodeDesc> columnExprMap = new HashMap<String, ExprNodeDesc>();
-    for (int i = 0; i < aggs.size() - 1; i++) {
-      ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(key.getTypeInfo(),
-              gbOutputNames.get(colPos), "", false);
+    ArrayList<ColumnInfo> rsColInfos = new ArrayList<>();
+    for (int colPos = 0; colPos < aggs.size(); colPos++) {
+      TypeInfo typInfo = groupbyColInfos.get(colPos).getType();
+      ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(typInfo, gbOutputNames.get(colPos), "", false);
       rsValueCols.add(colExpr);
-      columnExprMap.put(gbOutputNames.get(colPos), colExpr);
-      colPos++;
-    }
+      columnExprMap.put(Utilities.ReduceField.VALUE + "." + gbOutputNames.get(colPos), colExpr);
 
-    // Bloom Filter uses binary
-    ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(TypeInfoFactory.binaryTypeInfo,
-        gbOutputNames.get(colPos), "", false);
-    rsValueCols.add(colExpr);
-    columnExprMap.put(gbOutputNames.get(colPos), colExpr);
-    colPos++;
+      ColumnInfo colInfo =

Review Comment:
   @deniskuzz, I restored the intermediate RS creation code and refactored some code as you snipped above. Could you please review the change? Thank you.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "deniskuzZ (via GitHub)" <gi...@apache.org>.
deniskuzZ commented on code in PR #4043:
URL: https://github.com/apache/hive/pull/4043#discussion_r1388649716


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java:
##########
@@ -678,38 +678,34 @@ private boolean generateSemiJoinOperatorPlan(DynamicListContext ctx, ParseContex
     ArrayList<ColumnInfo> groupbyColInfos = new ArrayList<ColumnInfo>();
     groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(0), key.getTypeInfo(), "", false));
     groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(1), key.getTypeInfo(), "", false));
-    groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(2), key.getTypeInfo(), "", false));
+    groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(2), TypeInfoFactory.binaryTypeInfo, "", false));
 
     GroupByOperator groupByOp = (GroupByOperator)OperatorFactory.getAndMakeChild(
             groupBy, new RowSchema(groupbyColInfos), selectOp);
 
     groupByOp.setColumnExprMap(new HashMap<String, ExprNodeDesc>());
 
     // Get the column names of the aggregations for reduce sink
-    int colPos = 0;
     ArrayList<ExprNodeDesc> rsValueCols = new ArrayList<ExprNodeDesc>();
     Map<String, ExprNodeDesc> columnExprMap = new HashMap<String, ExprNodeDesc>();
-    for (int i = 0; i < aggs.size() - 1; i++) {
-      ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(key.getTypeInfo(),
-              gbOutputNames.get(colPos), "", false);
+    ArrayList<ColumnInfo> rsColInfos = new ArrayList<>();
+    for (int colPos = 0; colPos < aggs.size(); colPos++) {
+      TypeInfo typInfo = groupbyColInfos.get(colPos).getType();
+      ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(typInfo, gbOutputNames.get(colPos), "", false);
       rsValueCols.add(colExpr);
-      columnExprMap.put(gbOutputNames.get(colPos), colExpr);
-      colPos++;
-    }
+      columnExprMap.put(Utilities.ReduceField.VALUE + "." + gbOutputNames.get(colPos), colExpr);
 
-    // Bloom Filter uses binary
-    ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(TypeInfoFactory.binaryTypeInfo,
-        gbOutputNames.get(colPos), "", false);
-    rsValueCols.add(colExpr);
-    columnExprMap.put(gbOutputNames.get(colPos), colExpr);
-    colPos++;
+      ColumnInfo colInfo =

Review Comment:
   I've removed the ReduceField.VALUE from generateSemiJoinOperatorPlan, but kept in createFinalRsForSemiJoinOp --> sharedwork_semi_2.q passed and produced. same result 
   ````
   **generateSemiJoinOperatorPlan**
       for (int i = 0; i < aggs.size(); i++) {
         ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(groupbyColInfos.get(i).getType(), 
                 gbOutputNames.get(i), "", false);
         rsValueCols.add(colExpr);
         columnExprMap.put(gbOutputNames.get(i), colExpr);
       }
   **createFinalRsForSemiJoinOp**
       Map<String, ExprNodeDesc> columnExprMap = new HashMap<>();
       ArrayList<ColumnInfo> gbySchema = new ArrayList<>();
       for (ColumnInfo rsColInfo : gb.getSchema().getSignature()) {
         gbOutputNames.add(rsColInfo.getInternalName());
   
         ExprNodeColumnDesc rsValExpr = new ExprNodeColumnDesc(rsColInfo.getType(), rsColInfo.getInternalName(), "", false);
         rsValueCols.add(rsValExpr);
    
         String field = Utilities.ReduceField.VALUE.toString() + "." + outputColName;
         columnExprMap.put(field, rsValExpr);
         rsColInfos.add(new ColumnInfo(field, rsColInfo.getType(), "", false));
       }
   ````
   also that didn't require changes in ParallelEdgeFixer.colMappingInverseKeys, exprMap wasn't empty



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "deniskuzZ (via GitHub)" <gi...@apache.org>.
deniskuzZ commented on code in PR #4043:
URL: https://github.com/apache/hive/pull/4043#discussion_r1388046180


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java:
##########
@@ -678,38 +678,34 @@ private boolean generateSemiJoinOperatorPlan(DynamicListContext ctx, ParseContex
     ArrayList<ColumnInfo> groupbyColInfos = new ArrayList<ColumnInfo>();
     groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(0), key.getTypeInfo(), "", false));
     groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(1), key.getTypeInfo(), "", false));
-    groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(2), key.getTypeInfo(), "", false));
+    groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(2), TypeInfoFactory.binaryTypeInfo, "", false));
 
     GroupByOperator groupByOp = (GroupByOperator)OperatorFactory.getAndMakeChild(
             groupBy, new RowSchema(groupbyColInfos), selectOp);
 
     groupByOp.setColumnExprMap(new HashMap<String, ExprNodeDesc>());
 
     // Get the column names of the aggregations for reduce sink
-    int colPos = 0;
     ArrayList<ExprNodeDesc> rsValueCols = new ArrayList<ExprNodeDesc>();
     Map<String, ExprNodeDesc> columnExprMap = new HashMap<String, ExprNodeDesc>();
-    for (int i = 0; i < aggs.size() - 1; i++) {
-      ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(key.getTypeInfo(),
-              gbOutputNames.get(colPos), "", false);
+    ArrayList<ColumnInfo> rsColInfos = new ArrayList<>();
+    for (int colPos = 0; colPos < aggs.size(); colPos++) {
+      TypeInfo typInfo = groupbyColInfos.get(colPos).getType();
+      ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(typInfo, gbOutputNames.get(colPos), "", false);
       rsValueCols.add(colExpr);
-      columnExprMap.put(gbOutputNames.get(colPos), colExpr);
-      colPos++;
-    }
+      columnExprMap.put(Utilities.ReduceField.VALUE + "." + gbOutputNames.get(colPos), colExpr);
 
-    // Bloom Filter uses binary
-    ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(TypeInfoFactory.binaryTypeInfo,
-        gbOutputNames.get(colPos), "", false);
-    rsValueCols.add(colExpr);
-    columnExprMap.put(gbOutputNames.get(colPos), colExpr);
-    colPos++;
+      ColumnInfo colInfo =

Review Comment:
   sorry for the dumb question, but why do we need `ReduceField.VALUE` prefix in  RS operator? if that is required, how did it even work before? 
   in `SemiJoinReductionMerge.createGroupByAggregationParameters` ExprNodeColumnDesc colName is prefixed as well. maybe @kgyrtkirk or @kasakrisz can help



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "deniskuzZ (via GitHub)" <gi...@apache.org>.
deniskuzZ commented on code in PR #4043:
URL: https://github.com/apache/hive/pull/4043#discussion_r1388046180


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java:
##########
@@ -678,38 +678,34 @@ private boolean generateSemiJoinOperatorPlan(DynamicListContext ctx, ParseContex
     ArrayList<ColumnInfo> groupbyColInfos = new ArrayList<ColumnInfo>();
     groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(0), key.getTypeInfo(), "", false));
     groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(1), key.getTypeInfo(), "", false));
-    groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(2), key.getTypeInfo(), "", false));
+    groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(2), TypeInfoFactory.binaryTypeInfo, "", false));
 
     GroupByOperator groupByOp = (GroupByOperator)OperatorFactory.getAndMakeChild(
             groupBy, new RowSchema(groupbyColInfos), selectOp);
 
     groupByOp.setColumnExprMap(new HashMap<String, ExprNodeDesc>());
 
     // Get the column names of the aggregations for reduce sink
-    int colPos = 0;
     ArrayList<ExprNodeDesc> rsValueCols = new ArrayList<ExprNodeDesc>();
     Map<String, ExprNodeDesc> columnExprMap = new HashMap<String, ExprNodeDesc>();
-    for (int i = 0; i < aggs.size() - 1; i++) {
-      ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(key.getTypeInfo(),
-              gbOutputNames.get(colPos), "", false);
+    ArrayList<ColumnInfo> rsColInfos = new ArrayList<>();
+    for (int colPos = 0; colPos < aggs.size(); colPos++) {
+      TypeInfo typInfo = groupbyColInfos.get(colPos).getType();
+      ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(typInfo, gbOutputNames.get(colPos), "", false);
       rsValueCols.add(colExpr);
-      columnExprMap.put(gbOutputNames.get(colPos), colExpr);
-      colPos++;
-    }
+      columnExprMap.put(Utilities.ReduceField.VALUE + "." + gbOutputNames.get(colPos), colExpr);
 
-    // Bloom Filter uses binary
-    ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(TypeInfoFactory.binaryTypeInfo,
-        gbOutputNames.get(colPos), "", false);
-    rsValueCols.add(colExpr);
-    columnExprMap.put(gbOutputNames.get(colPos), colExpr);
-    colPos++;
+      ColumnInfo colInfo =

Review Comment:
   sorry for the dumb question, but why do we need `ReduceField.VALUE` prefix in  RS operator? if that is required, how did it even work before? 
   in `SemiJoinReductionMerge.createGroupByAggregationParameters` ExprNodeColumnDesc colName is prefixed as well. maybe @kgyrtkirk or @kgyrtkirk can help



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "deniskuzZ (via GitHub)" <gi...@apache.org>.
deniskuzZ commented on code in PR #4043:
URL: https://github.com/apache/hive/pull/4043#discussion_r1388381233


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ParallelEdgeFixer.java:
##########
@@ -324,26 +330,25 @@ private static String extractColumnName(ExprNodeDesc expr) throws SemanticExcept
   public static Optional<Set<String>> colMappingInverseKeys(ReduceSinkOperator rs) {
     Map<String, String> ret = new HashMap<String, String>();
     Map<String, ExprNodeDesc> exprMap = rs.getColumnExprMap();
-    Set<String> neededColumns = new HashSet<String>();
+
+    if (!rs.getSchema().getColumnNames().stream().allMatch(exprMap::containsKey)) {
+      // Cannot invert RS because exprMap does not contain all of the RS's input columns.
+      return Optional.empty();
+    }
+
     try {
       for (Entry<String, ExprNodeDesc> e : exprMap.entrySet()) {
         String columnName = extractColumnName(e.getValue());
         if (rs.getSchema().getColumnInfo(e.getKey()) == null) {
-          // ignore incorrectly mapped columns (if there's any) - but require its input to be present
-          neededColumns.add(columnName);
-        } else {
-          ret.put(columnName, e.getKey());
+          // Cannot invert RS because the column needed by expression is not mapped correctly.
+          return Optional.empty();

Review Comment:
   it looks like `exprMap` might have duplicate columns {columnName} with different mappings {e.getKey()}, so I think we need to restore to prev version



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "ngsg (via GitHub)" <gi...@apache.org>.
ngsg commented on code in PR #4043:
URL: https://github.com/apache/hive/pull/4043#discussion_r1388097859


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java:
##########
@@ -678,38 +678,34 @@ private boolean generateSemiJoinOperatorPlan(DynamicListContext ctx, ParseContex
     ArrayList<ColumnInfo> groupbyColInfos = new ArrayList<ColumnInfo>();
     groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(0), key.getTypeInfo(), "", false));
     groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(1), key.getTypeInfo(), "", false));
-    groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(2), key.getTypeInfo(), "", false));
+    groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(2), TypeInfoFactory.binaryTypeInfo, "", false));
 
     GroupByOperator groupByOp = (GroupByOperator)OperatorFactory.getAndMakeChild(
             groupBy, new RowSchema(groupbyColInfos), selectOp);
 
     groupByOp.setColumnExprMap(new HashMap<String, ExprNodeDesc>());
 
     // Get the column names of the aggregations for reduce sink
-    int colPos = 0;
     ArrayList<ExprNodeDesc> rsValueCols = new ArrayList<ExprNodeDesc>();
     Map<String, ExprNodeDesc> columnExprMap = new HashMap<String, ExprNodeDesc>();
-    for (int i = 0; i < aggs.size() - 1; i++) {
-      ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(key.getTypeInfo(),
-              gbOutputNames.get(colPos), "", false);
+    ArrayList<ColumnInfo> rsColInfos = new ArrayList<>();
+    for (int colPos = 0; colPos < aggs.size(); colPos++) {
+      TypeInfo typInfo = groupbyColInfos.get(colPos).getType();
+      ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(typInfo, gbOutputNames.get(colPos), "", false);
       rsValueCols.add(colExpr);
-      columnExprMap.put(gbOutputNames.get(colPos), colExpr);
-      colPos++;
-    }
+      columnExprMap.put(Utilities.ReduceField.VALUE + "." + gbOutputNames.get(colPos), colExpr);
 
-    // Bloom Filter uses binary
-    ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(TypeInfoFactory.binaryTypeInfo,
-        gbOutputNames.get(colPos), "", false);
-    rsValueCols.add(colExpr);
-    columnExprMap.put(gbOutputNames.get(colPos), colExpr);
-    colPos++;
+      ColumnInfo colInfo =

Review Comment:
   (Please understand that I may be wrong as it has been a long time since I worked on this issue. I'll check this issue again and leave you a comment if any corrections are needed.)
   RS operator forwards key-value pairs to child operator, and RS operators' columns are separated into KEY and VALUE by prepend `ReduceField.KEY` and `VALUE`.
   According to our records, we encountered "cannot find filed _col0 from [0:key, 1:value]" issue when we set correct ColumnExprMap to GBY operator. We found that SEL operator generated by PEF tries to read `_col0`, but RS operator only provides `VALUE._col0`. So we prepend the column schema as PEF refers to it during SEL operator creation.
   I can't remember why DPPOptimization works okay without `ReduceField.KEY/VALUE`. But our record says that we checked that other RS operator creator components also use `ReduceField.KEY/VALUE` when setting the schema of RS operator.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "deniskuzZ (via GitHub)" <gi...@apache.org>.
deniskuzZ commented on code in PR #4043:
URL: https://github.com/apache/hive/pull/4043#discussion_r1393173475


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java:
##########
@@ -678,38 +678,34 @@ private boolean generateSemiJoinOperatorPlan(DynamicListContext ctx, ParseContex
     ArrayList<ColumnInfo> groupbyColInfos = new ArrayList<ColumnInfo>();
     groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(0), key.getTypeInfo(), "", false));
     groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(1), key.getTypeInfo(), "", false));
-    groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(2), key.getTypeInfo(), "", false));
+    groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(2), TypeInfoFactory.binaryTypeInfo, "", false));
 
     GroupByOperator groupByOp = (GroupByOperator)OperatorFactory.getAndMakeChild(
             groupBy, new RowSchema(groupbyColInfos), selectOp);
 
     groupByOp.setColumnExprMap(new HashMap<String, ExprNodeDesc>());
 
     // Get the column names of the aggregations for reduce sink
-    int colPos = 0;
     ArrayList<ExprNodeDesc> rsValueCols = new ArrayList<ExprNodeDesc>();
     Map<String, ExprNodeDesc> columnExprMap = new HashMap<String, ExprNodeDesc>();
-    for (int i = 0; i < aggs.size() - 1; i++) {
-      ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(key.getTypeInfo(),
-              gbOutputNames.get(colPos), "", false);
+    ArrayList<ColumnInfo> rsColInfos = new ArrayList<>();
+    for (int colPos = 0; colPos < aggs.size(); colPos++) {
+      TypeInfo typInfo = groupbyColInfos.get(colPos).getType();
+      ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(typInfo, gbOutputNames.get(colPos), "", false);
       rsValueCols.add(colExpr);
-      columnExprMap.put(gbOutputNames.get(colPos), colExpr);
-      colPos++;
-    }
+      columnExprMap.put(Utilities.ReduceField.VALUE + "." + gbOutputNames.get(colPos), colExpr);
 
-    // Bloom Filter uses binary
-    ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(TypeInfoFactory.binaryTypeInfo,
-        gbOutputNames.get(colPos), "", false);
-    rsValueCols.add(colExpr);
-    columnExprMap.put(gbOutputNames.get(colPos), colExpr);
-    colPos++;
+      ColumnInfo colInfo =

Review Comment:
   @ngsg, could you please check the comment and code snippets from https://github.com/apache/hive/pull/4043#discussion_r1388649716, other than that it LGTM



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "kgyrtkirk (via GitHub)" <gi...@apache.org>.
kgyrtkirk commented on code in PR #4043:
URL: https://github.com/apache/hive/pull/4043#discussion_r1389339624


##########
ql/src/test/results/clientpositive/perf/tpcds30tb/tez/query2.q.out:
##########
@@ -11,7 +11,7 @@ STAGE PLANS:
         Map 11 <- Map 6 (BROADCAST_EDGE), Union 12 (CONTAINS)
         Map 14 <- Map 6 (BROADCAST_EDGE), Union 12 (CONTAINS)
         Map 5 <- Map 6 (BROADCAST_EDGE), Union 2 (CONTAINS)
-        Map 6 <- Reducer 8 (BROADCAST_EDGE), Reducer 9 (BROADCAST_EDGE)
+        Map 6 <- Reducer 10 (BROADCAST_EDGE), Reducer 8 (BROADCAST_EDGE), Reducer 9 (BROADCAST_EDGE)

Review Comment:
   this seems odd to me: I wonder why the sudden need of `Reducer 10` for `Map 6` ?
   * there are no changes in `Map 6`
   * `Map 6` refers only to `RS_47` and `RS_51` ; so its odd that it has 3 `Reducers`
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "ngsg (via GitHub)" <gi...@apache.org>.
ngsg commented on code in PR #4043:
URL: https://github.com/apache/hive/pull/4043#discussion_r1390718101


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ParallelEdgeFixer.java:
##########
@@ -325,6 +331,11 @@ public static Optional<Set<String>> colMappingInverseKeys(ReduceSinkOperator rs)
     Map<String, String> ret = new HashMap<String, String>();
     Map<String, ExprNodeDesc> exprMap = rs.getColumnExprMap();
     Set<String> neededColumns = new HashSet<String>();
+
+    if (!rs.getSchema().getColumnNames().stream().allMatch(exprMap::containsKey)) {

Review Comment:
   Thanks for pointing that out. I reverted it to the previous version. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "deniskuzZ (via GitHub)" <gi...@apache.org>.
deniskuzZ commented on code in PR #4043:
URL: https://github.com/apache/hive/pull/4043#discussion_r1400802755


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java:
##########
@@ -796,31 +787,29 @@ private void createFinalRsForSemiJoinOp(
           ExprNodeDesc key, String keyBaseAlias, ExprNodeDesc colExpr,
           boolean isHint) throws SemanticException {
     ArrayList<String> gbOutputNames = new ArrayList<>();
-    // One each for min, max and bloom filter
-    gbOutputNames.add(SemanticAnalyzer.getColumnInternalName(0));
-    gbOutputNames.add(SemanticAnalyzer.getColumnInternalName(1));
-    gbOutputNames.add(SemanticAnalyzer.getColumnInternalName(2));
-
-    int colPos = 0;
     ArrayList<ExprNodeDesc> rsValueCols = new ArrayList<ExprNodeDesc>();
-    for (int i = 0; i < gbOutputNames.size() - 1; i++) {
-      ExprNodeColumnDesc expr = new ExprNodeColumnDesc(key.getTypeInfo(),
-              gbOutputNames.get(colPos++), "", false);
-      rsValueCols.add(expr);
+    Map<String, ExprNodeDesc> columnExprMap = new HashMap<String, ExprNodeDesc>();
+    ArrayList<ColumnInfo> rsColInfos = new ArrayList<>();
+    List<ColumnInfo> gbySchema = gb.getSchema().getSignature();
+    for (int colPos = 0; colPos < gbySchema.size(); colPos++) {

Review Comment:
   use for-each
   ````
       ArrayList<ColumnInfo> gbySchema = new ArrayList<>();
       for (ColumnInfo rsColInfo : gb.getSchema().getSignature()) {
         gbOutputNames.add(rsColInfo.getInternalName());
   
         ExprNodeColumnDesc rsValExpr = new ExprNodeColumnDesc(rsColInfo.getType(), rsColInfo.getInternalName(), "", false);
         rsValueCols.add(rsValExpr);
    
         String field = Utilities.ReduceField.VALUE.toString() + "." + outputColName;
         columnExprMap.put(field, rsValExpr);
         rsColInfos.add(new ColumnInfo(field, rsColInfo.getType(), "", false));
       }
   ````



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "deniskuzZ (via GitHub)" <gi...@apache.org>.
deniskuzZ commented on code in PR #4043:
URL: https://github.com/apache/hive/pull/4043#discussion_r1400801699


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java:
##########
@@ -782,7 +773,7 @@ private boolean generateSemiJoinOperatorPlan(DynamicListContext ctx, ParseContex
             groupByMemoryUsage, memoryThreshold, minReductionHashAggr, minReductionHashAggrLowerBound,
             null, false, 0, false);
     GroupByOperator groupByOpFinal = (GroupByOperator)OperatorFactory.getAndMakeChild(
-            groupByDescFinal, new RowSchema(rsOp.getSchema()), rsOp);
+            groupByDescFinal, new RowSchema(groupByOp.getSchema()), rsOp);

Review Comment:
   please revert



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "deniskuzZ (via GitHub)" <gi...@apache.org>.
deniskuzZ commented on code in PR #4043:
URL: https://github.com/apache/hive/pull/4043#discussion_r1388458246


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ParallelEdgeFixer.java:
##########
@@ -325,6 +331,11 @@ public static Optional<Set<String>> colMappingInverseKeys(ReduceSinkOperator rs)
     Map<String, String> ret = new HashMap<String, String>();
     Map<String, ExprNodeDesc> exprMap = rs.getColumnExprMap();
     Set<String> neededColumns = new HashSet<String>();
+
+    if (!rs.getSchema().getColumnNames().stream().allMatch(exprMap::containsKey)) {

Review Comment:
   it looks like `exprMap` might have duplicate columns {columnName} with different mappings {e.getKey()}, so I think we need to restore to prev version (revert #27f2d73)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "ngsg (via GitHub)" <gi...@apache.org>.
ngsg commented on code in PR #4043:
URL: https://github.com/apache/hive/pull/4043#discussion_r1388011551


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java:
##########
@@ -678,38 +678,34 @@ private boolean generateSemiJoinOperatorPlan(DynamicListContext ctx, ParseContex
     ArrayList<ColumnInfo> groupbyColInfos = new ArrayList<ColumnInfo>();
     groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(0), key.getTypeInfo(), "", false));
     groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(1), key.getTypeInfo(), "", false));
-    groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(2), key.getTypeInfo(), "", false));
+    groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(2), TypeInfoFactory.binaryTypeInfo, "", false));
 
     GroupByOperator groupByOp = (GroupByOperator)OperatorFactory.getAndMakeChild(
             groupBy, new RowSchema(groupbyColInfos), selectOp);
 
     groupByOp.setColumnExprMap(new HashMap<String, ExprNodeDesc>());
 
     // Get the column names of the aggregations for reduce sink
-    int colPos = 0;
     ArrayList<ExprNodeDesc> rsValueCols = new ArrayList<ExprNodeDesc>();
     Map<String, ExprNodeDesc> columnExprMap = new HashMap<String, ExprNodeDesc>();
-    for (int i = 0; i < aggs.size() - 1; i++) {
-      ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(key.getTypeInfo(),
-              gbOutputNames.get(colPos), "", false);
+    ArrayList<ColumnInfo> rsColInfos = new ArrayList<>();
+    for (int colPos = 0; colPos < aggs.size(); colPos++) {
+      TypeInfo typInfo = groupbyColInfos.get(colPos).getType();
+      ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(typInfo, gbOutputNames.get(colPos), "", false);
       rsValueCols.add(colExpr);
-      columnExprMap.put(gbOutputNames.get(colPos), colExpr);
-      colPos++;
-    }
+      columnExprMap.put(Utilities.ReduceField.VALUE + "." + gbOutputNames.get(colPos), colExpr);
 
-    // Bloom Filter uses binary
-    ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(TypeInfoFactory.binaryTypeInfo,
-        gbOutputNames.get(colPos), "", false);
-    rsValueCols.add(colExpr);
-    columnExprMap.put(gbOutputNames.get(colPos), colExpr);
-    colPos++;
+      ColumnInfo colInfo =

Review Comment:
   According to our records, groupByOp.getSchema() does not contains `Utilities.ReduceField.VALUE`, which is required by RS operator.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "deniskuzZ (via GitHub)" <gi...@apache.org>.
deniskuzZ commented on code in PR #4043:
URL: https://github.com/apache/hive/pull/4043#discussion_r1387906470


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ParallelEdgeFixer.java:
##########
@@ -325,6 +331,11 @@ public static Optional<Set<String>> colMappingInverseKeys(ReduceSinkOperator rs)
     Map<String, String> ret = new HashMap<String, String>();
     Map<String, ExprNodeDesc> exprMap = rs.getColumnExprMap();
     Set<String> neededColumns = new HashSet<String>();
+
+    if (!rs.getSchema().getColumnNames().stream().allMatch(exprMap::containsKey)) {

Review Comment:
   @ngsg, is there a way to reproduce the issue with a smaller dataset?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "deniskuzZ (via GitHub)" <gi...@apache.org>.
deniskuzZ commented on code in PR #4043:
URL: https://github.com/apache/hive/pull/4043#discussion_r1393173475


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java:
##########
@@ -678,38 +678,34 @@ private boolean generateSemiJoinOperatorPlan(DynamicListContext ctx, ParseContex
     ArrayList<ColumnInfo> groupbyColInfos = new ArrayList<ColumnInfo>();
     groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(0), key.getTypeInfo(), "", false));
     groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(1), key.getTypeInfo(), "", false));
-    groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(2), key.getTypeInfo(), "", false));
+    groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(2), TypeInfoFactory.binaryTypeInfo, "", false));
 
     GroupByOperator groupByOp = (GroupByOperator)OperatorFactory.getAndMakeChild(
             groupBy, new RowSchema(groupbyColInfos), selectOp);
 
     groupByOp.setColumnExprMap(new HashMap<String, ExprNodeDesc>());
 
     // Get the column names of the aggregations for reduce sink
-    int colPos = 0;
     ArrayList<ExprNodeDesc> rsValueCols = new ArrayList<ExprNodeDesc>();
     Map<String, ExprNodeDesc> columnExprMap = new HashMap<String, ExprNodeDesc>();
-    for (int i = 0; i < aggs.size() - 1; i++) {
-      ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(key.getTypeInfo(),
-              gbOutputNames.get(colPos), "", false);
+    ArrayList<ColumnInfo> rsColInfos = new ArrayList<>();
+    for (int colPos = 0; colPos < aggs.size(); colPos++) {
+      TypeInfo typInfo = groupbyColInfos.get(colPos).getType();
+      ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(typInfo, gbOutputNames.get(colPos), "", false);
       rsValueCols.add(colExpr);
-      columnExprMap.put(gbOutputNames.get(colPos), colExpr);
-      colPos++;
-    }
+      columnExprMap.put(Utilities.ReduceField.VALUE + "." + gbOutputNames.get(colPos), colExpr);
 
-    // Bloom Filter uses binary
-    ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(TypeInfoFactory.binaryTypeInfo,
-        gbOutputNames.get(colPos), "", false);
-    rsValueCols.add(colExpr);
-    columnExprMap.put(gbOutputNames.get(colPos), colExpr);
-    colPos++;
+      ColumnInfo colInfo =

Review Comment:
   @ngsg could you please check the comment from https://github.com/apache/hive/pull/4043#discussion_r1388649716, other than that it LGTM



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java:
##########
@@ -678,38 +678,34 @@ private boolean generateSemiJoinOperatorPlan(DynamicListContext ctx, ParseContex
     ArrayList<ColumnInfo> groupbyColInfos = new ArrayList<ColumnInfo>();
     groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(0), key.getTypeInfo(), "", false));
     groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(1), key.getTypeInfo(), "", false));
-    groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(2), key.getTypeInfo(), "", false));
+    groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(2), TypeInfoFactory.binaryTypeInfo, "", false));
 
     GroupByOperator groupByOp = (GroupByOperator)OperatorFactory.getAndMakeChild(
             groupBy, new RowSchema(groupbyColInfos), selectOp);
 
     groupByOp.setColumnExprMap(new HashMap<String, ExprNodeDesc>());
 
     // Get the column names of the aggregations for reduce sink
-    int colPos = 0;
     ArrayList<ExprNodeDesc> rsValueCols = new ArrayList<ExprNodeDesc>();
     Map<String, ExprNodeDesc> columnExprMap = new HashMap<String, ExprNodeDesc>();
-    for (int i = 0; i < aggs.size() - 1; i++) {
-      ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(key.getTypeInfo(),
-              gbOutputNames.get(colPos), "", false);
+    ArrayList<ColumnInfo> rsColInfos = new ArrayList<>();
+    for (int colPos = 0; colPos < aggs.size(); colPos++) {
+      TypeInfo typInfo = groupbyColInfos.get(colPos).getType();
+      ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(typInfo, gbOutputNames.get(colPos), "", false);
       rsValueCols.add(colExpr);
-      columnExprMap.put(gbOutputNames.get(colPos), colExpr);
-      colPos++;
-    }
+      columnExprMap.put(Utilities.ReduceField.VALUE + "." + gbOutputNames.get(colPos), colExpr);
 
-    // Bloom Filter uses binary
-    ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(TypeInfoFactory.binaryTypeInfo,
-        gbOutputNames.get(colPos), "", false);
-    rsValueCols.add(colExpr);
-    columnExprMap.put(gbOutputNames.get(colPos), colExpr);
-    colPos++;
+      ColumnInfo colInfo =

Review Comment:
   @ngsg, could you please check the comment from https://github.com/apache/hive/pull/4043#discussion_r1388649716, other than that it LGTM



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] sonarcloud[bot] commented on pull request #4043: HIVE-27006: Fix ParallelEdgeFixer

Posted by "sonarcloud[bot] (via GitHub)" <gi...@apache.org>.
sonarcloud[bot] commented on PR #4043:
URL: https://github.com/apache/hive/pull/4043#issuecomment-1722318169

   Kudos, SonarCloud Quality Gate passed!&nbsp; &nbsp; [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive&pullRequest=4043)
   
   [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=BUG)  
   [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=VULNERABILITY)  
   [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4043&resolved=false&types=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4043&resolved=false&types=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4043&resolved=false&types=SECURITY_HOTSPOT)  
   [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=CODE_SMELL) [6 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=CODE_SMELL)
   
   [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4043&metric=coverage&view=list) No Coverage information  
   [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4043&metric=duplicated_lines_density&view=list) No Duplication information
   
   ![warning](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/message_warning-16px.png 'warning') The version of Java (11.0.8) you have used to run this analysis is deprecated and we will stop accepting it soon. Please update to at least Java 17.
   Read more [here](https://docs.sonarcloud.io/appendices/scanner-environment/)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "deniskuzZ (via GitHub)" <gi...@apache.org>.
deniskuzZ commented on code in PR #4043:
URL: https://github.com/apache/hive/pull/4043#discussion_r1388381233


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ParallelEdgeFixer.java:
##########
@@ -324,26 +330,25 @@ private static String extractColumnName(ExprNodeDesc expr) throws SemanticExcept
   public static Optional<Set<String>> colMappingInverseKeys(ReduceSinkOperator rs) {
     Map<String, String> ret = new HashMap<String, String>();
     Map<String, ExprNodeDesc> exprMap = rs.getColumnExprMap();
-    Set<String> neededColumns = new HashSet<String>();
+
+    if (!rs.getSchema().getColumnNames().stream().allMatch(exprMap::containsKey)) {
+      // Cannot invert RS because exprMap does not contain all of the RS's input columns.
+      return Optional.empty();
+    }
+
     try {
       for (Entry<String, ExprNodeDesc> e : exprMap.entrySet()) {
         String columnName = extractColumnName(e.getValue());
         if (rs.getSchema().getColumnInfo(e.getKey()) == null) {
-          // ignore incorrectly mapped columns (if there's any) - but require its input to be present
-          neededColumns.add(columnName);
-        } else {
-          ret.put(columnName, e.getKey());
+          // Cannot invert RS because the column needed by expression is not mapped correctly.
+          return Optional.empty();

Review Comment:
   it looks like `exprMap` might have duplicate columns {columnName} with different mappings {e.getKey()}, so I think we need to restore to prev version
   how about this:
   ````
   List<String> neededColumns = rs.getSchema().getColumnNames();
   try {
     for (Entry<String, ExprNodeDesc> e : exprMap.entrySet()) {
       String columnName = extractColumnName(e.getValue());
       if (rs.getSchema().getColumnInfo(e.getKey()) != null) {
         // ignore incorrectly mapped columns (if there's any) - but require its input to be present
         ret.put(columnName, e.getKey());
       }
     }
     neededColumns.removeAll(ret.keySet());
     if (!neededColumns.isEmpty()) {
       // There is no way to compute all parts of neededColumns
       return Optional.empty();
     }
   ````



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "kgyrtkirk (via GitHub)" <gi...@apache.org>.
kgyrtkirk commented on code in PR #4043:
URL: https://github.com/apache/hive/pull/4043#discussion_r1389344594


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SharedWorkOptimizer.java:
##########
@@ -1916,18 +1916,12 @@ public RelaxedVertexEdgePredicate(EnumSet<EdgeType> nonTraverseableEdgeTypes) {
 
     @Override
     public boolean accept(Operator<?> s, Operator<?> t, OpEdge opEdge) {
-      if (!traverseableEdgeTypes.contains(opEdge.getEdgeType())) {
-        return true;
-      }
-      if (s instanceof ReduceSinkOperator) {
-        ReduceSinkOperator rs = (ReduceSinkOperator) s;
-        if (!ParallelEdgeFixer.colMappingInverseKeys(rs).isPresent()) {
-          return true;
-        }
-      }
-      return false;
-    }
+      boolean notTraverseable = !traverseableEdgeTypes.contains(opEdge.getEdgeType());
+      boolean notInvertible = (s instanceof ReduceSinkOperator) &&
+          !ParallelEdgeFixer.colMappingInverseKeys((ReduceSinkOperator) s).isPresent();
 
+      return notTraverseable || notInvertible;

Review Comment:
   I think its better to not change something which is not broken...
   
   the previous version was eagerly avoiding to call `PEF#colMIK` in case the edge type was not matching - what if for some reason it starts throwing exceptions for irrelevant cases?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "deniskuzZ (via GitHub)" <gi...@apache.org>.
deniskuzZ commented on code in PR #4043:
URL: https://github.com/apache/hive/pull/4043#discussion_r1388381233


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ParallelEdgeFixer.java:
##########
@@ -324,26 +330,25 @@ private static String extractColumnName(ExprNodeDesc expr) throws SemanticExcept
   public static Optional<Set<String>> colMappingInverseKeys(ReduceSinkOperator rs) {
     Map<String, String> ret = new HashMap<String, String>();
     Map<String, ExprNodeDesc> exprMap = rs.getColumnExprMap();
-    Set<String> neededColumns = new HashSet<String>();
+
+    if (!rs.getSchema().getColumnNames().stream().allMatch(exprMap::containsKey)) {
+      // Cannot invert RS because exprMap does not contain all of the RS's input columns.
+      return Optional.empty();
+    }
+
     try {
       for (Entry<String, ExprNodeDesc> e : exprMap.entrySet()) {
         String columnName = extractColumnName(e.getValue());
         if (rs.getSchema().getColumnInfo(e.getKey()) == null) {
-          // ignore incorrectly mapped columns (if there's any) - but require its input to be present
-          neededColumns.add(columnName);
-        } else {
-          ret.put(columnName, e.getKey());
+          // Cannot invert RS because the column needed by expression is not mapped correctly.
+          return Optional.empty();

Review Comment:
   it looks like `exprMap` might have duplicate columns {columnName} with different mappings {e.getKey()}, so I think we need to restore to prev version



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "deniskuzZ (via GitHub)" <gi...@apache.org>.
deniskuzZ commented on code in PR #4043:
URL: https://github.com/apache/hive/pull/4043#discussion_r1388649716


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java:
##########
@@ -678,38 +678,34 @@ private boolean generateSemiJoinOperatorPlan(DynamicListContext ctx, ParseContex
     ArrayList<ColumnInfo> groupbyColInfos = new ArrayList<ColumnInfo>();
     groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(0), key.getTypeInfo(), "", false));
     groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(1), key.getTypeInfo(), "", false));
-    groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(2), key.getTypeInfo(), "", false));
+    groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(2), TypeInfoFactory.binaryTypeInfo, "", false));
 
     GroupByOperator groupByOp = (GroupByOperator)OperatorFactory.getAndMakeChild(
             groupBy, new RowSchema(groupbyColInfos), selectOp);
 
     groupByOp.setColumnExprMap(new HashMap<String, ExprNodeDesc>());
 
     // Get the column names of the aggregations for reduce sink
-    int colPos = 0;
     ArrayList<ExprNodeDesc> rsValueCols = new ArrayList<ExprNodeDesc>();
     Map<String, ExprNodeDesc> columnExprMap = new HashMap<String, ExprNodeDesc>();
-    for (int i = 0; i < aggs.size() - 1; i++) {
-      ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(key.getTypeInfo(),
-              gbOutputNames.get(colPos), "", false);
+    ArrayList<ColumnInfo> rsColInfos = new ArrayList<>();
+    for (int colPos = 0; colPos < aggs.size(); colPos++) {
+      TypeInfo typInfo = groupbyColInfos.get(colPos).getType();
+      ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(typInfo, gbOutputNames.get(colPos), "", false);
       rsValueCols.add(colExpr);
-      columnExprMap.put(gbOutputNames.get(colPos), colExpr);
-      colPos++;
-    }
+      columnExprMap.put(Utilities.ReduceField.VALUE + "." + gbOutputNames.get(colPos), colExpr);
 
-    // Bloom Filter uses binary
-    ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(TypeInfoFactory.binaryTypeInfo,
-        gbOutputNames.get(colPos), "", false);
-    rsValueCols.add(colExpr);
-    columnExprMap.put(gbOutputNames.get(colPos), colExpr);
-    colPos++;
+      ColumnInfo colInfo =

Review Comment:
   I've removed the ReduceField.VALUE from generateSemiJoinOperatorPlan, but kept in createFinalRsForSemiJoinOp --> sharedwork_semi_2.q passed and produced. same result 
   ````
   **generateSemiJoinOperatorPlan**
       for (int i = 0; i < aggs.size(); i++) {
         ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(groupbyColInfos.get(i).getType(), 
                 gbOutputNames.get(i), "", false);
         rsValueCols.add(colExpr);
         columnExprMap.put(gbOutputNames.get(i), colExpr);
       }
   **createFinalRsForSemiJoinOp**
       ArrayList<ColumnInfo> gbySchema = new ArrayList<>();
       Map<String, ExprNodeDesc> columnExprMap = new HashMap<>();
       for (int i = 0; i < gb.getSchema().getSignature().size(); i++) {
         ColumnInfo rsColInfo = gb.getSchema().getSignature().get(i);
         ExprNodeColumnDesc rsValExpr = new ExprNodeColumnDesc(rsColInfo.getType(), rsColInfo.getInternalName(), "", false);
         rsValueCols.add(rsValExpr);
   
         String outputColName = SemanticAnalyzer.getColumnInternalName(i);
         gbOutputNames.add(outputColName);
         
         String field = Utilities.ReduceField.VALUE.toString() + "." + outputColName;
         columnExprMap.put(field, rsValExpr);
         gbySchema.add(new ColumnInfo(field, rsColInfo.getType(), "", false));
       }
   ````
   also that didn't require changes in ParallelEdgeFixer.colMappingInverseKeys, exprMap wasn't empty



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "sonarcloud[bot] (via GitHub)" <gi...@apache.org>.
sonarcloud[bot] commented on PR #4043:
URL: https://github.com/apache/hive/pull/4043#issuecomment-1822793048

   Kudos, SonarCloud Quality Gate passed!&nbsp; &nbsp; [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive&pullRequest=4043)
   
   [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=BUG)  
   [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=VULNERABILITY)  
   [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4043&resolved=false&types=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4043&resolved=false&types=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4043&resolved=false&types=SECURITY_HOTSPOT)  
   [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=CODE_SMELL) [7 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=CODE_SMELL)
   
   [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4043&metric=coverage&view=list) No Coverage information  
   [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4043&metric=duplicated_lines_density&view=list) No Duplication information
   
   ![warning](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/message_warning-16px.png 'warning') The version of Java (11.0.8) you have used to run this analysis is deprecated and we will stop accepting it soon. Please update to at least Java 17.
   Read more [here](https://docs.sonarsource.com/sonarcloud/appendices/scanner-environment/)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "sonarcloud[bot] (via GitHub)" <gi...@apache.org>.
sonarcloud[bot] commented on PR #4043:
URL: https://github.com/apache/hive/pull/4043#issuecomment-1818931938

   Kudos, SonarCloud Quality Gate passed!&nbsp; &nbsp; [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive&pullRequest=4043)
   
   [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=BUG)  
   [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=VULNERABILITY)  
   [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4043&resolved=false&types=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4043&resolved=false&types=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4043&resolved=false&types=SECURITY_HOTSPOT)  
   [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=CODE_SMELL) [1 Code Smell](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=CODE_SMELL)
   
   [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4043&metric=coverage&view=list) No Coverage information  
   [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4043&metric=duplicated_lines_density&view=list) No Duplication information
   
   ![warning](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/message_warning-16px.png 'warning') The version of Java (11.0.8) you have used to run this analysis is deprecated and we will stop accepting it soon. Please update to at least Java 17.
   Read more [here](https://docs.sonarsource.com/sonarcloud/appendices/scanner-environment/)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] sonarcloud[bot] commented on pull request #4043: HIVE-27006: Fix ParallelEdgeFixer

Posted by "sonarcloud[bot] (via GitHub)" <gi...@apache.org>.
sonarcloud[bot] commented on PR #4043:
URL: https://github.com/apache/hive/pull/4043#issuecomment-1554753819

   Kudos, SonarCloud Quality Gate passed!&nbsp; &nbsp; [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive&pullRequest=4043)
   
   [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=BUG)  
   [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=VULNERABILITY)  
   [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4043&resolved=false&types=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4043&resolved=false&types=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4043&resolved=false&types=SECURITY_HOTSPOT)  
   [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=CODE_SMELL) [10 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=CODE_SMELL)
   
   [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4043&metric=coverage&view=list) No Coverage information  
   [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4043&metric=duplicated_lines_density&view=list) No Duplication information
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "deniskuzZ (via GitHub)" <gi...@apache.org>.
deniskuzZ commented on code in PR #4043:
URL: https://github.com/apache/hive/pull/4043#discussion_r1386417851


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ParallelEdgeFixer.java:
##########
@@ -325,6 +331,11 @@ public static Optional<Set<String>> colMappingInverseKeys(ReduceSinkOperator rs)
     Map<String, String> ret = new HashMap<String, String>();
     Map<String, ExprNodeDesc> exprMap = rs.getColumnExprMap();
     Set<String> neededColumns = new HashSet<String>();
+
+    if (!rs.getSchema().getColumnNames().stream().allMatch(exprMap::containsKey)) {

Review Comment:
   with that optimization, should we remove the `neededColumns` completely?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "ngsg (via GitHub)" <gi...@apache.org>.
ngsg commented on code in PR #4043:
URL: https://github.com/apache/hive/pull/4043#discussion_r1387531248


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ParallelEdgeFixer.java:
##########
@@ -325,6 +331,11 @@ public static Optional<Set<String>> colMappingInverseKeys(ReduceSinkOperator rs)
     Map<String, String> ret = new HashMap<String, String>();
     Map<String, ExprNodeDesc> exprMap = rs.getColumnExprMap();
     Set<String> neededColumns = new HashSet<String>();
+
+    if (!rs.getSchema().getColumnNames().stream().allMatch(exprMap::containsKey)) {

Review Comment:
   I forgot to push my local changes; please check the commit 27f2d73 for the requested changes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "deniskuzZ (via GitHub)" <gi...@apache.org>.
deniskuzZ commented on code in PR #4043:
URL: https://github.com/apache/hive/pull/4043#discussion_r1388026483


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java:
##########
@@ -796,31 +792,29 @@ private void createFinalRsForSemiJoinOp(
           ExprNodeDesc key, String keyBaseAlias, ExprNodeDesc colExpr,
           boolean isHint) throws SemanticException {
     ArrayList<String> gbOutputNames = new ArrayList<>();
-    // One each for min, max and bloom filter
-    gbOutputNames.add(SemanticAnalyzer.getColumnInternalName(0));
-    gbOutputNames.add(SemanticAnalyzer.getColumnInternalName(1));
-    gbOutputNames.add(SemanticAnalyzer.getColumnInternalName(2));
-
-    int colPos = 0;
     ArrayList<ExprNodeDesc> rsValueCols = new ArrayList<ExprNodeDesc>();
-    for (int i = 0; i < gbOutputNames.size() - 1; i++) {
-      ExprNodeColumnDesc expr = new ExprNodeColumnDesc(key.getTypeInfo(),
-              gbOutputNames.get(colPos++), "", false);
-      rsValueCols.add(expr);
+    Map<String, ExprNodeDesc> columnExprMap = new HashMap<String, ExprNodeDesc>();
+    ArrayList<ColumnInfo> rsColInfos = new ArrayList<>();
+    for (int colPos = 0; colPos < gb.getSchema().getSignature().size(); colPos++) {

Review Comment:
   could we please extract `gb.getSchema().getSignature()` into a local var



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "ngsg (via GitHub)" <gi...@apache.org>.
ngsg commented on code in PR #4043:
URL: https://github.com/apache/hive/pull/4043#discussion_r1388034064


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ParallelEdgeFixer.java:
##########
@@ -325,6 +331,11 @@ public static Optional<Set<String>> colMappingInverseKeys(ReduceSinkOperator rs)
     Map<String, String> ret = new HashMap<String, String>();
     Map<String, ExprNodeDesc> exprMap = rs.getColumnExprMap();
     Set<String> neededColumns = new HashSet<String>();
+
+    if (!rs.getSchema().getColumnNames().stream().allMatch(exprMap::containsKey)) {

Review Comment:
   The added qfile, sharedwork_semi_2.q, reproduces the issue and it does not require large data. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "deniskuzZ (via GitHub)" <gi...@apache.org>.
deniskuzZ commented on code in PR #4043:
URL: https://github.com/apache/hive/pull/4043#discussion_r1388381233


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ParallelEdgeFixer.java:
##########
@@ -324,26 +330,25 @@ private static String extractColumnName(ExprNodeDesc expr) throws SemanticExcept
   public static Optional<Set<String>> colMappingInverseKeys(ReduceSinkOperator rs) {
     Map<String, String> ret = new HashMap<String, String>();
     Map<String, ExprNodeDesc> exprMap = rs.getColumnExprMap();
-    Set<String> neededColumns = new HashSet<String>();
+
+    if (!rs.getSchema().getColumnNames().stream().allMatch(exprMap::containsKey)) {
+      // Cannot invert RS because exprMap does not contain all of the RS's input columns.
+      return Optional.empty();
+    }
+
     try {
       for (Entry<String, ExprNodeDesc> e : exprMap.entrySet()) {
         String columnName = extractColumnName(e.getValue());
         if (rs.getSchema().getColumnInfo(e.getKey()) == null) {
-          // ignore incorrectly mapped columns (if there's any) - but require its input to be present
-          neededColumns.add(columnName);
-        } else {
-          ret.put(columnName, e.getKey());
+          // Cannot invert RS because the column needed by expression is not mapped correctly.
+          return Optional.empty();

Review Comment:
   it looks like `exprMap` might have duplicate columns {columnName} with different mappings {e.getKey()}, so I think we need to restore to prev version
   how about this:
   ````
       Set<String> neededColumns = new HashSet<String>();
       try {
         for (Entry<String, ExprNodeDesc> e : exprMap.entrySet()) {
           String columnName = extractColumnName(e.getValue());
           if (rs.getSchema().getColumnInfo(e.getKey()) == null) {
             // ignore incorrectly mapped columns (if there's any) - but require its input to be present
             neededColumns.add(columnName);
           } else {
             ret.put(columnName, e.getKey());
           }
         }
         neededColumns.removeAll(ret.keySet());
         if (!neededColumns.isEmpty() || exprMap.isEmpty()) {
           // There is no way to compute all parts of neededColumns
           return Optional.empty();
         }
   }
   
   ````



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "ngsg (via GitHub)" <gi...@apache.org>.
ngsg commented on code in PR #4043:
URL: https://github.com/apache/hive/pull/4043#discussion_r1390737001


##########
ql/src/test/results/clientpositive/perf/tpcds30tb/tez/query2.q.out:
##########
@@ -11,7 +11,7 @@ STAGE PLANS:
         Map 11 <- Map 6 (BROADCAST_EDGE), Union 12 (CONTAINS)
         Map 14 <- Map 6 (BROADCAST_EDGE), Union 12 (CONTAINS)
         Map 5 <- Map 6 (BROADCAST_EDGE), Union 2 (CONTAINS)
-        Map 6 <- Reducer 8 (BROADCAST_EDGE), Reducer 9 (BROADCAST_EDGE)
+        Map 6 <- Reducer 10 (BROADCAST_EDGE), Reducer 8 (BROADCAST_EDGE), Reducer 9 (BROADCAST_EDGE)

Review Comment:
   Current ParallelEdgeFixer does not update RuntimeValueInformation(RVI) correctly. Because TezCompiler creates SemiJoin edges based on RVI, this issue leads to absence of some edges.
   
   The edge between Map6 and Reducer10 is one of the disappeared edge. After SWO, Map6 has 2 incoming SemiJoin edges that come from the same reducer. So PEF inserts SEL-RS in order to prevent parallel edge, but it does not update RVI of the parent of the inserted SEL-RS. That's why previous plan does not contain an edge between Map6 and Reducer10.
   
   I attached 3 operator graphs for the sake of your better understanding. All graphs are generated during TPCDS30TB-query2 test.
   Before applying PEF:
   <img width="395" alt="master before dot 1" src="https://github.com/apache/hive/assets/29757139/0a096c15-6286-48d6-a348-e7201b4a127d">
   
   After applying current PEF:
   <img width="454" alt="master after dot 1" src="https://github.com/apache/hive/assets/29757139/50d11e34-3c94-4812-b5dd-6b7804899b4b">
   
   After applying modified PEF:
   <img width="264" alt="h27006 after dot 1" src="https://github.com/apache/hive/assets/29757139/58e2f472-7dfb-4688-b639-883fe9b8acff">
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "deniskuzZ (via GitHub)" <gi...@apache.org>.
deniskuzZ commented on code in PR #4043:
URL: https://github.com/apache/hive/pull/4043#discussion_r1398486962


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java:
##########
@@ -678,38 +678,34 @@ private boolean generateSemiJoinOperatorPlan(DynamicListContext ctx, ParseContex
     ArrayList<ColumnInfo> groupbyColInfos = new ArrayList<ColumnInfo>();
     groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(0), key.getTypeInfo(), "", false));
     groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(1), key.getTypeInfo(), "", false));
-    groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(2), key.getTypeInfo(), "", false));
+    groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(2), TypeInfoFactory.binaryTypeInfo, "", false));
 
     GroupByOperator groupByOp = (GroupByOperator)OperatorFactory.getAndMakeChild(
             groupBy, new RowSchema(groupbyColInfos), selectOp);
 
     groupByOp.setColumnExprMap(new HashMap<String, ExprNodeDesc>());
 
     // Get the column names of the aggregations for reduce sink
-    int colPos = 0;
     ArrayList<ExprNodeDesc> rsValueCols = new ArrayList<ExprNodeDesc>();
     Map<String, ExprNodeDesc> columnExprMap = new HashMap<String, ExprNodeDesc>();
-    for (int i = 0; i < aggs.size() - 1; i++) {
-      ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(key.getTypeInfo(),
-              gbOutputNames.get(colPos), "", false);
+    ArrayList<ColumnInfo> rsColInfos = new ArrayList<>();
+    for (int colPos = 0; colPos < aggs.size(); colPos++) {
+      TypeInfo typInfo = groupbyColInfos.get(colPos).getType();
+      ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(typInfo, gbOutputNames.get(colPos), "", false);
       rsValueCols.add(colExpr);
-      columnExprMap.put(gbOutputNames.get(colPos), colExpr);
-      colPos++;
-    }
+      columnExprMap.put(Utilities.ReduceField.VALUE + "." + gbOutputNames.get(colPos), colExpr);
 
-    // Bloom Filter uses binary
-    ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(TypeInfoFactory.binaryTypeInfo,
-        gbOutputNames.get(colPos), "", false);
-    rsValueCols.add(colExpr);
-    columnExprMap.put(gbOutputNames.get(colPos), colExpr);
-    colPos++;
+      ColumnInfo colInfo =

Review Comment:
   hi @ngsg, thanks for the extended explanation!
   1. That's clear, I've debugged the code as well, thanks for sharing.
   2. Unless you have a test that emulates PEF applied to the intermediate RS operator, please restore the original code. Also, could you please apply code refactor done in the snipped above.
   3. Ok



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] github-actions[bot] closed pull request #4043: HIVE-27006: Fix ParallelEdgeFixer

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed pull request #4043: HIVE-27006: Fix ParallelEdgeFixer
URL: https://github.com/apache/hive/pull/4043


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] ayushtkn commented on pull request #4043: HIVE-27006: Fix ParallelEdgeFixer

Posted by "ayushtkn (via GitHub)" <gi...@apache.org>.
ayushtkn commented on PR #4043:
URL: https://github.com/apache/hive/pull/4043#issuecomment-1722308267

   I have reopened the PR. Lets see if the Jenkins result.
   I haven't started looking into this, not something which is my area of expertise, will try to give it a shot, or try to get some help from people with some idea around this part of code.
   
   @ngsg why is basic stats changing in all other q files? From the ticket it seems it is one broken TPCDS query, is this fixing or changing something with stats as well?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "deniskuzZ (via GitHub)" <gi...@apache.org>.
deniskuzZ commented on code in PR #4043:
URL: https://github.com/apache/hive/pull/4043#discussion_r1386417851


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ParallelEdgeFixer.java:
##########
@@ -325,6 +331,11 @@ public static Optional<Set<String>> colMappingInverseKeys(ReduceSinkOperator rs)
     Map<String, String> ret = new HashMap<String, String>();
     Map<String, ExprNodeDesc> exprMap = rs.getColumnExprMap();
     Set<String> neededColumns = new HashSet<String>();
+
+    if (!rs.getSchema().getColumnNames().stream().allMatch(exprMap::containsKey)) {

Review Comment:
   with that optimization, should we remove the `neededColumns` completely and add a comment?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "deniskuzZ (via GitHub)" <gi...@apache.org>.
deniskuzZ commented on code in PR #4043:
URL: https://github.com/apache/hive/pull/4043#discussion_r1387994260


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java:
##########
@@ -678,38 +678,34 @@ private boolean generateSemiJoinOperatorPlan(DynamicListContext ctx, ParseContex
     ArrayList<ColumnInfo> groupbyColInfos = new ArrayList<ColumnInfo>();
     groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(0), key.getTypeInfo(), "", false));
     groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(1), key.getTypeInfo(), "", false));
-    groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(2), key.getTypeInfo(), "", false));
+    groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(2), TypeInfoFactory.binaryTypeInfo, "", false));
 
     GroupByOperator groupByOp = (GroupByOperator)OperatorFactory.getAndMakeChild(
             groupBy, new RowSchema(groupbyColInfos), selectOp);
 
     groupByOp.setColumnExprMap(new HashMap<String, ExprNodeDesc>());
 
     // Get the column names of the aggregations for reduce sink
-    int colPos = 0;
     ArrayList<ExprNodeDesc> rsValueCols = new ArrayList<ExprNodeDesc>();
     Map<String, ExprNodeDesc> columnExprMap = new HashMap<String, ExprNodeDesc>();
-    for (int i = 0; i < aggs.size() - 1; i++) {
-      ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(key.getTypeInfo(),
-              gbOutputNames.get(colPos), "", false);
+    ArrayList<ColumnInfo> rsColInfos = new ArrayList<>();
+    for (int colPos = 0; colPos < aggs.size(); colPos++) {
+      TypeInfo typInfo = groupbyColInfos.get(colPos).getType();
+      ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(typInfo, gbOutputNames.get(colPos), "", false);
       rsValueCols.add(colExpr);
-      columnExprMap.put(gbOutputNames.get(colPos), colExpr);
-      colPos++;
-    }
+      columnExprMap.put(Utilities.ReduceField.VALUE + "." + gbOutputNames.get(colPos), colExpr);
 
-    // Bloom Filter uses binary
-    ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(TypeInfoFactory.binaryTypeInfo,
-        gbOutputNames.get(colPos), "", false);
-    rsValueCols.add(colExpr);
-    columnExprMap.put(gbOutputNames.get(colPos), colExpr);
-    colPos++;
+      ColumnInfo colInfo =

Review Comment:
   why do we need this, can we use groupByOp.getSchema()?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "deniskuzZ (via GitHub)" <gi...@apache.org>.
deniskuzZ commented on code in PR #4043:
URL: https://github.com/apache/hive/pull/4043#discussion_r1400802755


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java:
##########
@@ -796,31 +787,29 @@ private void createFinalRsForSemiJoinOp(
           ExprNodeDesc key, String keyBaseAlias, ExprNodeDesc colExpr,
           boolean isHint) throws SemanticException {
     ArrayList<String> gbOutputNames = new ArrayList<>();
-    // One each for min, max and bloom filter
-    gbOutputNames.add(SemanticAnalyzer.getColumnInternalName(0));
-    gbOutputNames.add(SemanticAnalyzer.getColumnInternalName(1));
-    gbOutputNames.add(SemanticAnalyzer.getColumnInternalName(2));
-
-    int colPos = 0;
     ArrayList<ExprNodeDesc> rsValueCols = new ArrayList<ExprNodeDesc>();
-    for (int i = 0; i < gbOutputNames.size() - 1; i++) {
-      ExprNodeColumnDesc expr = new ExprNodeColumnDesc(key.getTypeInfo(),
-              gbOutputNames.get(colPos++), "", false);
-      rsValueCols.add(expr);
+    Map<String, ExprNodeDesc> columnExprMap = new HashMap<String, ExprNodeDesc>();
+    ArrayList<ColumnInfo> rsColInfos = new ArrayList<>();
+    List<ColumnInfo> gbySchema = gb.getSchema().getSignature();
+    for (int colPos = 0; colPos < gbySchema.size(); colPos++) {

Review Comment:
   please use for-each
   ````
       ArrayList<ColumnInfo> gbySchema = new ArrayList<>();
       for (ColumnInfo rsColInfo : gb.getSchema().getSignature()) {
         gbOutputNames.add(rsColInfo.getInternalName());
   
         ExprNodeColumnDesc rsValExpr = new ExprNodeColumnDesc(rsColInfo.getType(), rsColInfo.getInternalName(), "", false);
         rsValueCols.add(rsValExpr);
    
         String field = Utilities.ReduceField.VALUE.toString() + "." + outputColName;
         columnExprMap.put(field, rsValExpr);
         rsColInfos.add(new ColumnInfo(field, rsColInfo.getType(), "", false));
       }
   ````



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "sonarcloud[bot] (via GitHub)" <gi...@apache.org>.
sonarcloud[bot] commented on PR #4043:
URL: https://github.com/apache/hive/pull/4043#issuecomment-1822199939

   Kudos, SonarCloud Quality Gate passed!&nbsp; &nbsp; [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive&pullRequest=4043)
   
   [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=BUG)  
   [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=VULNERABILITY)  
   [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4043&resolved=false&types=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4043&resolved=false&types=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4043&resolved=false&types=SECURITY_HOTSPOT)  
   [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=CODE_SMELL) [9 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=CODE_SMELL)
   
   [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4043&metric=coverage&view=list) No Coverage information  
   [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4043&metric=duplicated_lines_density&view=list) No Duplication information
   
   ![warning](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/message_warning-16px.png 'warning') The version of Java (11.0.8) you have used to run this analysis is deprecated and we will stop accepting it soon. Please update to at least Java 17.
   Read more [here](https://docs.sonarsource.com/sonarcloud/appendices/scanner-environment/)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "deniskuzZ (via GitHub)" <gi...@apache.org>.
deniskuzZ commented on code in PR #4043:
URL: https://github.com/apache/hive/pull/4043#discussion_r1400805808


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java:
##########
@@ -678,38 +678,34 @@ private boolean generateSemiJoinOperatorPlan(DynamicListContext ctx, ParseContex
     ArrayList<ColumnInfo> groupbyColInfos = new ArrayList<ColumnInfo>();
     groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(0), key.getTypeInfo(), "", false));
     groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(1), key.getTypeInfo(), "", false));
-    groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(2), key.getTypeInfo(), "", false));
+    groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(2), TypeInfoFactory.binaryTypeInfo, "", false));
 
     GroupByOperator groupByOp = (GroupByOperator)OperatorFactory.getAndMakeChild(
             groupBy, new RowSchema(groupbyColInfos), selectOp);
 
     groupByOp.setColumnExprMap(new HashMap<String, ExprNodeDesc>());
 
     // Get the column names of the aggregations for reduce sink
-    int colPos = 0;
     ArrayList<ExprNodeDesc> rsValueCols = new ArrayList<ExprNodeDesc>();
     Map<String, ExprNodeDesc> columnExprMap = new HashMap<String, ExprNodeDesc>();
-    for (int i = 0; i < aggs.size() - 1; i++) {
-      ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(key.getTypeInfo(),
-              gbOutputNames.get(colPos), "", false);
+    ArrayList<ColumnInfo> rsColInfos = new ArrayList<>();
+    for (int colPos = 0; colPos < aggs.size(); colPos++) {
+      TypeInfo typInfo = groupbyColInfos.get(colPos).getType();
+      ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(typInfo, gbOutputNames.get(colPos), "", false);
       rsValueCols.add(colExpr);
-      columnExprMap.put(gbOutputNames.get(colPos), colExpr);
-      colPos++;
-    }
+      columnExprMap.put(Utilities.ReduceField.VALUE + "." + gbOutputNames.get(colPos), colExpr);
 
-    // Bloom Filter uses binary
-    ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(TypeInfoFactory.binaryTypeInfo,
-        gbOutputNames.get(colPos), "", false);
-    rsValueCols.add(colExpr);
-    columnExprMap.put(gbOutputNames.get(colPos), colExpr);
-    colPos++;
+      ColumnInfo colInfo =

Review Comment:
   @ngsg, there is still 1 leftover + no need for index in for-loop (added comment). Please take a look



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "deniskuzZ (via GitHub)" <gi...@apache.org>.
deniskuzZ commented on code in PR #4043:
URL: https://github.com/apache/hive/pull/4043#discussion_r1401790387


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java:
##########
@@ -796,31 +787,29 @@ private void createFinalRsForSemiJoinOp(
           ExprNodeDesc key, String keyBaseAlias, ExprNodeDesc colExpr,
           boolean isHint) throws SemanticException {
     ArrayList<String> gbOutputNames = new ArrayList<>();
-    // One each for min, max and bloom filter
-    gbOutputNames.add(SemanticAnalyzer.getColumnInternalName(0));
-    gbOutputNames.add(SemanticAnalyzer.getColumnInternalName(1));
-    gbOutputNames.add(SemanticAnalyzer.getColumnInternalName(2));
-
-    int colPos = 0;
     ArrayList<ExprNodeDesc> rsValueCols = new ArrayList<ExprNodeDesc>();
-    for (int i = 0; i < gbOutputNames.size() - 1; i++) {
-      ExprNodeColumnDesc expr = new ExprNodeColumnDesc(key.getTypeInfo(),
-              gbOutputNames.get(colPos++), "", false);
-      rsValueCols.add(expr);
+    Map<String, ExprNodeDesc> columnExprMap = new HashMap<String, ExprNodeDesc>();
+    ArrayList<ColumnInfo> rsColInfos = new ArrayList<>();
+    List<ColumnInfo> gbySchema = gb.getSchema().getSignature();

Review Comment:
   @ngsg , please remove `gbySchema` since you replaced it with `rsColInfos `



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "deniskuzZ (via GitHub)" <gi...@apache.org>.
deniskuzZ commented on code in PR #4043:
URL: https://github.com/apache/hive/pull/4043#discussion_r1401790387


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java:
##########
@@ -796,31 +787,29 @@ private void createFinalRsForSemiJoinOp(
           ExprNodeDesc key, String keyBaseAlias, ExprNodeDesc colExpr,
           boolean isHint) throws SemanticException {
     ArrayList<String> gbOutputNames = new ArrayList<>();
-    // One each for min, max and bloom filter
-    gbOutputNames.add(SemanticAnalyzer.getColumnInternalName(0));
-    gbOutputNames.add(SemanticAnalyzer.getColumnInternalName(1));
-    gbOutputNames.add(SemanticAnalyzer.getColumnInternalName(2));
-
-    int colPos = 0;
     ArrayList<ExprNodeDesc> rsValueCols = new ArrayList<ExprNodeDesc>();
-    for (int i = 0; i < gbOutputNames.size() - 1; i++) {
-      ExprNodeColumnDesc expr = new ExprNodeColumnDesc(key.getTypeInfo(),
-              gbOutputNames.get(colPos++), "", false);
-      rsValueCols.add(expr);
+    Map<String, ExprNodeDesc> columnExprMap = new HashMap<String, ExprNodeDesc>();
+    ArrayList<ColumnInfo> rsColInfos = new ArrayList<>();
+    List<ColumnInfo> gbySchema = gb.getSchema().getSignature();

Review Comment:
   @ngsg , please remove `gbySchema` and change the `rsColInfos` type to interface 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] sonarcloud[bot] commented on pull request #4043: HIVE-27006: Fix ParallelEdgeFixer

Posted by "sonarcloud[bot] (via GitHub)" <gi...@apache.org>.
sonarcloud[bot] commented on PR #4043:
URL: https://github.com/apache/hive/pull/4043#issuecomment-1434039921

   Kudos, SonarCloud Quality Gate passed!&nbsp; &nbsp; [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive&pullRequest=4043)
   
   [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=BUG)  
   [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=VULNERABILITY)  
   [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4043&resolved=false&types=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4043&resolved=false&types=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4043&resolved=false&types=SECURITY_HOTSPOT)  
   [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=CODE_SMELL) [1 Code Smell](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=CODE_SMELL)
   
   [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4043&metric=coverage&view=list) No Coverage information  
   [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4043&metric=duplicated_lines_density&view=list) No Duplication information
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "deniskuzZ (via GitHub)" <gi...@apache.org>.
deniskuzZ commented on code in PR #4043:
URL: https://github.com/apache/hive/pull/4043#discussion_r1388381233


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ParallelEdgeFixer.java:
##########
@@ -324,26 +330,25 @@ private static String extractColumnName(ExprNodeDesc expr) throws SemanticExcept
   public static Optional<Set<String>> colMappingInverseKeys(ReduceSinkOperator rs) {
     Map<String, String> ret = new HashMap<String, String>();
     Map<String, ExprNodeDesc> exprMap = rs.getColumnExprMap();
-    Set<String> neededColumns = new HashSet<String>();
+
+    if (!rs.getSchema().getColumnNames().stream().allMatch(exprMap::containsKey)) {
+      // Cannot invert RS because exprMap does not contain all of the RS's input columns.
+      return Optional.empty();
+    }
+
     try {
       for (Entry<String, ExprNodeDesc> e : exprMap.entrySet()) {
         String columnName = extractColumnName(e.getValue());
         if (rs.getSchema().getColumnInfo(e.getKey()) == null) {
-          // ignore incorrectly mapped columns (if there's any) - but require its input to be present
-          neededColumns.add(columnName);
-        } else {
-          ret.put(columnName, e.getKey());
+          // Cannot invert RS because the column needed by expression is not mapped correctly.
+          return Optional.empty();

Review Comment:
   it looks like `exprMap` might have duplicate columns {columnName} with different mappings {e.getKey()}, so I think we need to restore to prev version
   how about this:
   ````
   List<String> neededColumns = rs.getSchema().getColumnNames();
       try {
         for (Entry<String, ExprNodeDesc> e : exprMap.entrySet()) {
           String columnName = extractColumnName(e.getValue());
           if (rs.getSchema().getColumnInfo(e.getKey()) != null) {
             // ignore incorrectly mapped columns (if there's any) - but require its input to be present
             ret.put(columnName, e.getKey());
           }
         }
         neededColumns.removeAll(ret.keySet());
         if (!neededColumns.isEmpty()) {
           // There is no way to compute all parts of neededColumns
           return Optional.empty();
         }
   ````



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "deniskuzZ (via GitHub)" <gi...@apache.org>.
deniskuzZ commented on code in PR #4043:
URL: https://github.com/apache/hive/pull/4043#discussion_r1388381233


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ParallelEdgeFixer.java:
##########
@@ -324,26 +330,25 @@ private static String extractColumnName(ExprNodeDesc expr) throws SemanticExcept
   public static Optional<Set<String>> colMappingInverseKeys(ReduceSinkOperator rs) {
     Map<String, String> ret = new HashMap<String, String>();
     Map<String, ExprNodeDesc> exprMap = rs.getColumnExprMap();
-    Set<String> neededColumns = new HashSet<String>();
+
+    if (!rs.getSchema().getColumnNames().stream().allMatch(exprMap::containsKey)) {
+      // Cannot invert RS because exprMap does not contain all of the RS's input columns.
+      return Optional.empty();
+    }
+
     try {
       for (Entry<String, ExprNodeDesc> e : exprMap.entrySet()) {
         String columnName = extractColumnName(e.getValue());
         if (rs.getSchema().getColumnInfo(e.getKey()) == null) {
-          // ignore incorrectly mapped columns (if there's any) - but require its input to be present
-          neededColumns.add(columnName);
-        } else {
-          ret.put(columnName, e.getKey());
+          // Cannot invert RS because the column needed by expression is not mapped correctly.
+          return Optional.empty();

Review Comment:
   it looks like `exprMap` might have duplicate columns {columnName} with different mappings {e.getKey()}, so I think we need to restore to prev version
   how about this:
   ````
   for (Entry<String, ExprNodeDesc> e : exprMap.entrySet()) {
     String columnName = extractColumnName(e.getValue());
     if (rs.getSchema().getColumnInfo(e.getKey()) != null) {
     // ignore incorrectly mapped columns (if there's any) - but require its input to be present
     ret.put(columnName, e.getKey());
     }
   }
   if (ret.keySet().containsAll(rs.getSchema().getColumnNames())) {
     // There is no way to compute all parts of neededColumns
     return Optional.empty();
   }
   
   ````



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "deniskuzZ (via GitHub)" <gi...@apache.org>.
deniskuzZ commented on code in PR #4043:
URL: https://github.com/apache/hive/pull/4043#discussion_r1388267013


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ParallelEdgeFixer.java:
##########
@@ -325,6 +331,11 @@ public static Optional<Set<String>> colMappingInverseKeys(ReduceSinkOperator rs)
     Map<String, String> ret = new HashMap<String, String>();
     Map<String, ExprNodeDesc> exprMap = rs.getColumnExprMap();
     Set<String> neededColumns = new HashSet<String>();
+
+    if (!rs.getSchema().getColumnNames().stream().allMatch(exprMap::containsKey)) {

Review Comment:
   👍 thanks



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ParallelEdgeFixer.java:
##########
@@ -325,6 +331,11 @@ public static Optional<Set<String>> colMappingInverseKeys(ReduceSinkOperator rs)
     Map<String, String> ret = new HashMap<String, String>();
     Map<String, ExprNodeDesc> exprMap = rs.getColumnExprMap();
     Set<String> neededColumns = new HashSet<String>();
+
+    if (!rs.getSchema().getColumnNames().stream().allMatch(exprMap::containsKey)) {

Review Comment:
   👍 thanks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "deniskuzZ (via GitHub)" <gi...@apache.org>.
deniskuzZ commented on code in PR #4043:
URL: https://github.com/apache/hive/pull/4043#discussion_r1388046180


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java:
##########
@@ -678,38 +678,34 @@ private boolean generateSemiJoinOperatorPlan(DynamicListContext ctx, ParseContex
     ArrayList<ColumnInfo> groupbyColInfos = new ArrayList<ColumnInfo>();
     groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(0), key.getTypeInfo(), "", false));
     groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(1), key.getTypeInfo(), "", false));
-    groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(2), key.getTypeInfo(), "", false));
+    groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(2), TypeInfoFactory.binaryTypeInfo, "", false));
 
     GroupByOperator groupByOp = (GroupByOperator)OperatorFactory.getAndMakeChild(
             groupBy, new RowSchema(groupbyColInfos), selectOp);
 
     groupByOp.setColumnExprMap(new HashMap<String, ExprNodeDesc>());
 
     // Get the column names of the aggregations for reduce sink
-    int colPos = 0;
     ArrayList<ExprNodeDesc> rsValueCols = new ArrayList<ExprNodeDesc>();
     Map<String, ExprNodeDesc> columnExprMap = new HashMap<String, ExprNodeDesc>();
-    for (int i = 0; i < aggs.size() - 1; i++) {
-      ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(key.getTypeInfo(),
-              gbOutputNames.get(colPos), "", false);
+    ArrayList<ColumnInfo> rsColInfos = new ArrayList<>();
+    for (int colPos = 0; colPos < aggs.size(); colPos++) {
+      TypeInfo typInfo = groupbyColInfos.get(colPos).getType();
+      ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(typInfo, gbOutputNames.get(colPos), "", false);
       rsValueCols.add(colExpr);
-      columnExprMap.put(gbOutputNames.get(colPos), colExpr);
-      colPos++;
-    }
+      columnExprMap.put(Utilities.ReduceField.VALUE + "." + gbOutputNames.get(colPos), colExpr);
 
-    // Bloom Filter uses binary
-    ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(TypeInfoFactory.binaryTypeInfo,
-        gbOutputNames.get(colPos), "", false);
-    rsValueCols.add(colExpr);
-    columnExprMap.put(gbOutputNames.get(colPos), colExpr);
-    colPos++;
+      ColumnInfo colInfo =

Review Comment:
   sorry for the dumb question, but why do we need `ReduceField.VALUE` prefix in  RS operator? if that is required, how did it even work? 
   in `SemiJoinReductionMerge.createGroupByAggregationParameters` ExprNodeColumnDesc colName is prefixed as well. maybe @kgyrtkirk or @kgyrtkirk can help



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "ngsg (via GitHub)" <gi...@apache.org>.
ngsg commented on code in PR #4043:
URL: https://github.com/apache/hive/pull/4043#discussion_r1390719384


##########
ql/src/test/queries/clientpositive/sharedwork_semi_2.q:
##########
@@ -0,0 +1,104 @@
+set hive.auto.convert.anti.join=false;
+set hive.auto.convert.join.noconditionaltask.size=1145044992;
+set hive.auto.convert.join.noconditionaltask=true;
+set hive.auto.convert.join=true;
+set hive.auto.convert.sortmerge.join.to.mapjoin=true;
+set hive.auto.convert.sortmerge.join=true;
+set hive.cbo.enable=true;
+set hive.convert.join.bucket.mapjoin.tez=true;
+set hive.enforce.sortmergebucketmapjoin=true;
+set hive.exec.dynamic.partition.mode=nonstrict;
+set hive.exec.dynamic.partition=true;
+set hive.exec.max.dynamic.partitions.pernode=4000;
+set hive.exec.max.dynamic.partitions=10000;
+set hive.exec.parallel.thread.number=32;
+set hive.exec.parallel=false;

Review Comment:
   removed irrelevant configurations



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SharedWorkOptimizer.java:
##########
@@ -1916,18 +1916,12 @@ public RelaxedVertexEdgePredicate(EnumSet<EdgeType> nonTraverseableEdgeTypes) {
 
     @Override
     public boolean accept(Operator<?> s, Operator<?> t, OpEdge opEdge) {
-      if (!traverseableEdgeTypes.contains(opEdge.getEdgeType())) {
-        return true;
-      }
-      if (s instanceof ReduceSinkOperator) {
-        ReduceSinkOperator rs = (ReduceSinkOperator) s;
-        if (!ParallelEdgeFixer.colMappingInverseKeys(rs).isPresent()) {
-          return true;
-        }
-      }
-      return false;
-    }
+      boolean notTraverseable = !traverseableEdgeTypes.contains(opEdge.getEdgeType());
+      boolean notInvertible = (s instanceof ReduceSinkOperator) &&
+          !ParallelEdgeFixer.colMappingInverseKeys((ReduceSinkOperator) s).isPresent();
 
+      return notTraverseable || notInvertible;

Review Comment:
   I reverted it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "ngsg (via GitHub)" <gi...@apache.org>.
ngsg commented on code in PR #4043:
URL: https://github.com/apache/hive/pull/4043#discussion_r1390718984


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java:
##########
@@ -796,31 +792,29 @@ private void createFinalRsForSemiJoinOp(
           ExprNodeDesc key, String keyBaseAlias, ExprNodeDesc colExpr,
           boolean isHint) throws SemanticException {
     ArrayList<String> gbOutputNames = new ArrayList<>();
-    // One each for min, max and bloom filter
-    gbOutputNames.add(SemanticAnalyzer.getColumnInternalName(0));
-    gbOutputNames.add(SemanticAnalyzer.getColumnInternalName(1));
-    gbOutputNames.add(SemanticAnalyzer.getColumnInternalName(2));
-
-    int colPos = 0;
     ArrayList<ExprNodeDesc> rsValueCols = new ArrayList<ExprNodeDesc>();
-    for (int i = 0; i < gbOutputNames.size() - 1; i++) {
-      ExprNodeColumnDesc expr = new ExprNodeColumnDesc(key.getTypeInfo(),
-              gbOutputNames.get(colPos++), "", false);
-      rsValueCols.add(expr);
+    Map<String, ExprNodeDesc> columnExprMap = new HashMap<String, ExprNodeDesc>();
+    ArrayList<ColumnInfo> rsColInfos = new ArrayList<>();
+    for (int colPos = 0; colPos < gb.getSchema().getSignature().size(); colPos++) {

Review Comment:
   OK, I introduced a new local variable named `gbySchema`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "sonarcloud[bot] (via GitHub)" <gi...@apache.org>.
sonarcloud[bot] commented on PR #4043:
URL: https://github.com/apache/hive/pull/4043#issuecomment-1807738850

   Kudos, SonarCloud Quality Gate passed!&nbsp; &nbsp; [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive&pullRequest=4043)
   
   [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=BUG)  
   [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=VULNERABILITY)  
   [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4043&resolved=false&types=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4043&resolved=false&types=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4043&resolved=false&types=SECURITY_HOTSPOT)  
   [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=CODE_SMELL) [1 Code Smell](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4043&resolved=false&types=CODE_SMELL)
   
   [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4043&metric=coverage&view=list) No Coverage information  
   [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4043&metric=duplicated_lines_density&view=list) No Duplication information
   
   ![warning](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/message_warning-16px.png 'warning') The version of Java (11.0.8) you have used to run this analysis is deprecated and we will stop accepting it soon. Please update to at least Java 17.
   Read more [here](https://docs.sonarcloud.io/appendices/scanner-environment/)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "ngsg (via GitHub)" <gi...@apache.org>.
ngsg commented on code in PR #4043:
URL: https://github.com/apache/hive/pull/4043#discussion_r1401482531


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java:
##########
@@ -782,7 +773,7 @@ private boolean generateSemiJoinOperatorPlan(DynamicListContext ctx, ParseContex
             groupByMemoryUsage, memoryThreshold, minReductionHashAggr, minReductionHashAggrLowerBound,
             null, false, 0, false);
     GroupByOperator groupByOpFinal = (GroupByOperator)OperatorFactory.getAndMakeChild(
-            groupByDescFinal, new RowSchema(rsOp.getSchema()), rsOp);
+            groupByDescFinal, new RowSchema(groupByOp.getSchema()), rsOp);

Review Comment:
   reverted to previous version



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "deniskuzZ (via GitHub)" <gi...@apache.org>.
deniskuzZ commented on code in PR #4043:
URL: https://github.com/apache/hive/pull/4043#discussion_r1400802755


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java:
##########
@@ -796,31 +787,29 @@ private void createFinalRsForSemiJoinOp(
           ExprNodeDesc key, String keyBaseAlias, ExprNodeDesc colExpr,
           boolean isHint) throws SemanticException {
     ArrayList<String> gbOutputNames = new ArrayList<>();
-    // One each for min, max and bloom filter
-    gbOutputNames.add(SemanticAnalyzer.getColumnInternalName(0));
-    gbOutputNames.add(SemanticAnalyzer.getColumnInternalName(1));
-    gbOutputNames.add(SemanticAnalyzer.getColumnInternalName(2));
-
-    int colPos = 0;
     ArrayList<ExprNodeDesc> rsValueCols = new ArrayList<ExprNodeDesc>();
-    for (int i = 0; i < gbOutputNames.size() - 1; i++) {
-      ExprNodeColumnDesc expr = new ExprNodeColumnDesc(key.getTypeInfo(),
-              gbOutputNames.get(colPos++), "", false);
-      rsValueCols.add(expr);
+    Map<String, ExprNodeDesc> columnExprMap = new HashMap<String, ExprNodeDesc>();
+    ArrayList<ColumnInfo> rsColInfos = new ArrayList<>();
+    List<ColumnInfo> gbySchema = gb.getSchema().getSignature();
+    for (int colPos = 0; colPos < gbySchema.size(); colPos++) {

Review Comment:
   use for-each
   ````
   Map<String, ExprNodeDesc> columnExprMap = new HashMap<>();
       ArrayList<ColumnInfo> gbySchema = new ArrayList<>();
       for (ColumnInfo rsColInfo : gb.getSchema().getSignature()) {
         gbOutputNames.add(rsColInfo.getInternalName());
   
         ExprNodeColumnDesc rsValExpr = new ExprNodeColumnDesc(rsColInfo.getType(), rsColInfo.getInternalName(), "", false);
         rsValueCols.add(rsValExpr);
    
         String field = Utilities.ReduceField.VALUE.toString() + "." + outputColName;
         columnExprMap.put(field, rsValExpr);
         rsColInfos.add(new ColumnInfo(field, rsColInfo.getType(), "", false));
       }
   ````



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Posted by "deniskuzZ (via GitHub)" <gi...@apache.org>.
deniskuzZ commented on code in PR #4043:
URL: https://github.com/apache/hive/pull/4043#discussion_r1401790387


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java:
##########
@@ -796,31 +787,29 @@ private void createFinalRsForSemiJoinOp(
           ExprNodeDesc key, String keyBaseAlias, ExprNodeDesc colExpr,
           boolean isHint) throws SemanticException {
     ArrayList<String> gbOutputNames = new ArrayList<>();
-    // One each for min, max and bloom filter
-    gbOutputNames.add(SemanticAnalyzer.getColumnInternalName(0));
-    gbOutputNames.add(SemanticAnalyzer.getColumnInternalName(1));
-    gbOutputNames.add(SemanticAnalyzer.getColumnInternalName(2));
-
-    int colPos = 0;
     ArrayList<ExprNodeDesc> rsValueCols = new ArrayList<ExprNodeDesc>();
-    for (int i = 0; i < gbOutputNames.size() - 1; i++) {
-      ExprNodeColumnDesc expr = new ExprNodeColumnDesc(key.getTypeInfo(),
-              gbOutputNames.get(colPos++), "", false);
-      rsValueCols.add(expr);
+    Map<String, ExprNodeDesc> columnExprMap = new HashMap<String, ExprNodeDesc>();
+    ArrayList<ColumnInfo> rsColInfos = new ArrayList<>();
+    List<ColumnInfo> gbySchema = gb.getSchema().getSignature();

Review Comment:
   @ngsg , please remove `gbySchema` since you replaced it with `rsColInfos `, however, change the type to interface "List"



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org