You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Jason Dere (JIRA)" <ji...@apache.org> on 2015/06/17 01:17:00 UTC

[jira] [Commented] (HIVE-11028) Tez: table self join and join with another table fails with IndexOutOfBoundsException

    [ https://issues.apache.org/jira/browse/HIVE-11028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588988#comment-14588988 ] 

Jason Dere commented on HIVE-11028:
-----------------------------------

It looks like this is caused because TezCompiler invokes ConstantPropagate and this is removing some columns, but without a corresponding call to ColumnPruner to remove outputColumnNames from the join operator.

Talking to [~jpullokkaran] and [~hagleitn], the use of ConstantPropagate in TezCompiler is to remove extra (and unnecessary) "AND true" predicates generated during dynamic partition pruning. One solution is to eliminate just those expressions (referred to in ConstantPropagate as short-cutting), as opposed to doing full constant folding. I'll try to add an option to ConstantPropagate where we can specify that we only want to perform expression short-cutting rather than full constant folding.

> Tez: table self join and join with another table fails with IndexOutOfBoundsException
> -------------------------------------------------------------------------------------
>
>                 Key: HIVE-11028
>                 URL: https://issues.apache.org/jira/browse/HIVE-11028
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Planning
>            Reporter: Jason Dere
>            Assignee: Jason Dere
>
> {noformat}
> create table tez_self_join1(id1 int, id2 string, id3 string);
> insert into table tez_self_join1 values(1, 'aa','bb'), (2, 'ab','ab'), (3,'ba','ba');
> create table tez_self_join2(id1 int);
> insert into table tez_self_join2 values(1),(2),(3);
> explain
> select s.id2, s.id3
> from
> (
>  select self1.id1, self1.id2, self1.id3
>  from tez_self_join1 self1 join tez_self_join1 self2
>  on self1.id2=self2.id3 ) s
> join tez_self_join2
> on s.id1=tez_self_join2.id1
> where s.id2='ab';
> {noformat}
> fails with error:
> {noformat}
> 2015-06-16 15:41:55,759 ERROR [main]: ql.Driver (SessionState.java:printError(979)) - FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Reducer 3, vertexId=vertex_1434494327112_0002_4_04, diagnostics=[Task failed, taskId=task_1434494327112_0002_4_04_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
>         at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
>         at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
>         at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
>         at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
>         at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>         at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
>         at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
>         at java.util.ArrayList.rangeCheck(ArrayList.java:635)
>         at java.util.ArrayList.get(ArrayList.java:411)
>         at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.init(StandardStructObjectInspector.java:118)
>         at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.<init>(StandardStructObjectInspector.java:109)
>         at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:290)
>         at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:275)
>         at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.getJoinOutputObjectInspector(CommonJoinOperator.java:175)
>         at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.initializeOp(CommonJoinOperator.java:313)
>         at org.apache.hadoop.hive.ql.exec.AbstractMapJoinOperator.initializeOp(AbstractMapJoinOperator.java:71)
>         at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.initializeOp(CommonMergeJoinOperator.java:99)
>         at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:362)
>         at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:146)
>         at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:147)
>         ... 13 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)