You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Wei Zhang (Jira)" <ji...@apache.org> on 2022/04/11 07:41:00 UTC
[jira] [Comment Edited] (HIVE-23010) IllegalStateException in tez.ReduceRecordProcessor when containers are being reused
[ https://issues.apache.org/jira/browse/HIVE-23010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520364#comment-17520364 ]
Wei Zhang edited comment on HIVE-23010 at 4/11/22 7:40 AM:
-----------------------------------------------------------
This is because the mergejoin operator will be added as theĀ dummyoperatore's child during the first attempt run, and the mergeworklist is cached across different attempts.
was (Author: zhangweilst):
This is because the mergejoin operator will be added as theĀ dummyoperatore's child, and the mergeworklist is cached across different attempts.
> IllegalStateException in tez.ReduceRecordProcessor when containers are being reused
> -----------------------------------------------------------------------------------
>
> Key: HIVE-23010
> URL: https://issues.apache.org/jira/browse/HIVE-23010
> Project: Hive
> Issue Type: Bug
> Affects Versions: 3.1.0
> Reporter: Sebastian Klemke
> Priority: Major
> Attachments: simplified-explain.txt
>
>
> When executing a query in Hive that runs a filesink, mergejoin and two group by operators in a single reduce vertex (reducer 2 in [^simplified-explain.txt]), the following exception occurs non-deterministically:
> {code:java}
> java.lang.RuntimeException: java.lang.IllegalStateException: Was expecting dummy store operator but found: FS[17]
> at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
> at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
> at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
> at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
> at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
> at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
> at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
> at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.IllegalStateException: Was expecting dummy store operator but found: FS[17]
> at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.getJoinParentOp(ReduceRecordProcessor.java:421)
> at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.getJoinParentOp(ReduceRecordProcessor.java:425)
> at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.getJoinParentOp(ReduceRecordProcessor.java:425)
> at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.getJoinParentOp(ReduceRecordProcessor.java:425)
> at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:148)
> at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)
> ... 16 more
> {code}
> Looking at Yarn logs, IllegalStateException occurs in a container if and only if
> * the container has been running a task attempt of "Reducer 2" successfully before
> * the container is then being reused for another task attempt of the same "Reducer 2" vertex
> The same query runs fine with tez.am.container.reuse.enabled=false.
> Apparently, this error occurs deterministically within a container that is being reused for multiple task attempts of the same reduce vertex.
> We have not been able to reproduce this error deterministically or with a smaller execution plan due to low probability of container reuse for same vertex.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)