You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/06/27 00:24:00 UTC

[jira] [Work logged] (HIVE-26179) In tez reuse container mode, asyncInitOperations are not clear.

     [ https://issues.apache.org/jira/browse/HIVE-26179?focusedWorklogId=784899&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-784899 ]

ASF GitHub Bot logged work on HIVE-26179:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 27/Jun/22 00:23
            Start Date: 27/Jun/22 00:23
    Worklog Time Spent: 10m 
      Work Description: github-actions[bot] commented on PR #3249:
URL: https://github.com/apache/hive/pull/3249#issuecomment-1166693471

   This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the dev@hive.apache.org list if the patch is in need of reviews.




Issue Time Tracking
-------------------

    Worklog Id:     (was: 784899)
    Time Spent: 0.5h  (was: 20m)

> In tez reuse container mode, asyncInitOperations are not clear.
> ---------------------------------------------------------------
>
>                 Key: HIVE-26179
>                 URL: https://issues.apache.org/jira/browse/HIVE-26179
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive, Tez
>    Affects Versions: 1.2.1
>         Environment: engine: Tez (Note: tez.am.container.reuse.enabled is true)
>  
>            Reporter: zhengchenyu
>            Assignee: zhengchenyu
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In our cluster, we found error like this.
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1650608671415_321290_1_11, diagnostics=[Task failed, taskId=task_1650608671415_321290_1_11_000422, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : attempt_1650608671415_321290_1_11_000422_0:java.lang.RuntimeException: java.lang.RuntimeException: Hive Runtime Error while closing operators
>     at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
>     at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:135)
>     at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>     at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>     at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:422)
>     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>     at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>     at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>     at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>     at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
>     at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
>     at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
>     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: Hive Runtime Error while closing operators
>     at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:349)
>     at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:161)
>     ... 16 more
> Caused by: java.lang.NullPointerException
>     at org.apache.hadoop.hive.ql.exec.MapJoinOperator.closeOp(MapJoinOperator.java:488)
>     at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:684)
>     at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:698)
>     at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:338)
>     ... 17 more
> {code}
> When tez reuse container is enable, and use MapJoinOperator, if same tasks's different taskattemp execute in same container, will throw NPE.
> By my debug, I found the second task attempt use first task's asyncInitOperations. asyncInitOperations are not clear when close op, then second taskattemp may use first taskattepmt's mapJoinTables which HybridHashTableContainer.HashPartition is closed, so throw NPE.
> We must clear asyncInitOperations when op is closed.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)