You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/04/28 10:41:00 UTC
[jira] [Work logged] (HIVE-26006) TopNKey and PTF with more than one column is failing with IOBE

     [ https://issues.apache.org/jira/browse/HIVE-26006?focusedWorklogId=763419&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-763419 ]

ASF GitHub Bot logged work on HIVE-26006:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 28/Apr/22 10:40
            Start Date: 28/Apr/22 10:40
    Worklog Time Spent: 10m 
      Work Description: zabetak commented on code in PR #3082:
URL: https://github.com/apache/hive/pull/3082#discussion_r860699101


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/topnkey/TopNKeyPushdownProcessor.java:
##########
@@ -244,13 +223,35 @@ private void pushdownThroughLeftOuterJoin(TopNKeyOperator topNKey) throws Semant
             reduceSinkDesc.getColumnExprMap(),
             reduceSinkDesc.getOrder(),
             reduceSinkDesc.getNullOrder());
+
+    pushDownThrough(commonKeyPrefix, topNKey, join, reduceSinkOperator);
+  }
+
+  private <T extends AbstractOperatorDesc> void pushDownThrough(
+          CommonKeyPrefix commonKeyPrefix, TopNKeyOperator topNKey, Operator<T> operator)
+          throws SemanticException {
+
+    pushDownThrough(commonKeyPrefix, topNKey, operator, operator);
+  }
+
+  private <TDesc extends AbstractOperatorDesc, TParentDesc extends AbstractOperatorDesc> void pushDownThrough(
+          CommonKeyPrefix commonKeyPrefix, TopNKeyOperator topNKey,
+          Operator<TDesc> join, Operator<TParentDesc> reduceSinkOperator)
+          throws SemanticException {
+
+    final TopNKeyDesc topNKeyDesc = topNKey.getConf();
     if (commonKeyPrefix.isEmpty() || commonKeyPrefix.size() == topNKeyDesc.getPartitionKeyColumns().size()) {
       return;
     }
 
+    final TopNKeyDesc newTopNKeyDesc = topNKeyDesc.combine(commonKeyPrefix);
+    if (newTopNKeyDesc.getKeyColumns().size() > 0 &&
+            newTopNKeyDesc.getKeyColumns().size() <= newTopNKeyDesc.getPartitionKeyColumns().size()) {

Review Comment:
   Do we need to create the new `TopNKeyDesc` to do this check? Don't we have already all the info?
   
   Can you add more comments on why we need to bail out.
   
   Do we have test coverage for this case. In other words does existing test enter this new if statement?



##########
ql/src/java/org/apache/hadoop/hive/ql/plan/TopNKeyDesc.java:
##########
@@ -252,7 +252,8 @@ public TopNKeyDescExplainVectorization getTopNKeyVectorization() {
   public TopNKeyDesc combine(CommonKeyPrefix commonKeyPrefix) {
     return new TopNKeyDesc(topN, commonKeyPrefix.getMappedOrder(),
             commonKeyPrefix.getMappedNullOrder(), commonKeyPrefix.getMappedColumns(),
-            commonKeyPrefix.getMappedColumns().subList(0, partitionKeyColumns.size()),
+            commonKeyPrefix.getMappedColumns()
+                    .subList(0, Math.min(partitionKeyColumns.size(), commonKeyPrefix.getMappedColumns().size())),

Review Comment:
   This is the main part of the fix right? The rest is mostly refactoring to take advantage of the new bail-out condition?



##########
ql/src/test/queries/clientpositive/ptf_tnk.q:
##########
@@ -0,0 +1,22 @@
+CREATE EXTERNAL TABLE t1(

Review Comment:
   Would it be possible to also load some data and verify that the results of the query are correct?



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/topnkey/TopNKeyPushdownProcessor.java:
##########
@@ -244,13 +223,35 @@ private void pushdownThroughLeftOuterJoin(TopNKeyOperator topNKey) throws Semant
             reduceSinkDesc.getColumnExprMap(),
             reduceSinkDesc.getOrder(),
             reduceSinkDesc.getNullOrder());
+
+    pushDownThrough(commonKeyPrefix, topNKey, join, reduceSinkOperator);
+  }
+
+  private <T extends AbstractOperatorDesc> void pushDownThrough(
+          CommonKeyPrefix commonKeyPrefix, TopNKeyOperator topNKey, Operator<T> operator)
+          throws SemanticException {
+
+    pushDownThrough(commonKeyPrefix, topNKey, operator, operator);
+  }
+
+  private <TDesc extends AbstractOperatorDesc, TParentDesc extends AbstractOperatorDesc> void pushDownThrough(
+          CommonKeyPrefix commonKeyPrefix, TopNKeyOperator topNKey,
+          Operator<TDesc> join, Operator<TParentDesc> reduceSinkOperator)

Review Comment:
   Are the operators here strictly a join and reduce sink? From the code I get the impression that there are more options. Should we pick more descriptive names?





Issue Time Tracking
-------------------

    Worklog Id:     (was: 763419)
    Time Spent: 20m  (was: 10m)

> TopNKey and PTF with more than one column is failing with IOBE
> --------------------------------------------------------------
>
>                 Key: HIVE-26006
>                 URL: https://issues.apache.org/jira/browse/HIVE-26006
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Naresh P R
>            Assignee: Krisztian Kasa
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> {code:java}
> java.lang.IndexOutOfBoundsException: toIndex = 2
> at java.util.ArrayList.subListRangeCheck(ArrayList.java:1014)
> at java.util.ArrayList.subList(ArrayList.java:1006)
> at org.apache.hadoop.hive.ql.plan.TopNKeyDesc.combine(TopNKeyDesc.java:201)
> at org.apache.hadoop.hive.ql.optimizer.topnkey.TopNKeyPushdownProcessor.pushdownThroughGroupBy(TopNKeyPushdownProcessor.java:162)
> at org.apache.hadoop.hive.ql.optimizer.topnkey.TopNKeyPushdownProcessor.pushdown(TopNKeyPushdownProcessor.java:76)
> at org.apache.hadoop.hive.ql.optimizer.topnkey.TopNKeyPushdownProcessor.process(TopNKeyPushdownProcessor.java:57)
> at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
> at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
> at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
> at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:158)
> at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
> at org.apache.hadoop.hive.ql.parse.TezCompiler.runTopNKeyOptimization(TezCompiler.java:1305)
> at org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:173)
> at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:159)
> at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12646)
> at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:358)
> at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:283)
> at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:219)
> at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:103)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:215){code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)