You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Daniel Dai (JIRA)" <ji...@apache.org> on 2015/05/12 02:09:00 UTC

[jira] [Commented] (TEZ-2426) Ensure the eventRouter thread completes before switching to a new task and thread safety fixes in IPOContexts.

    [ https://issues.apache.org/jira/browse/TEZ-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538928#comment-14538928 ] 

Daniel Dai commented on TEZ-2426:
---------------------------------

This breaks a Pig unit test TestUnionOnSchema.testUnionOnSchemaSuccOps. Error message:
{code}
Error: Failure while running task:org.apache.pig.backend.executionengine.ExecException: ERROR 0: Input from vertex scope-74 is missing
	at org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POIdentityInOutTez.attachInputs(POIdentityInOutTez.java:91)
	at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.initializeInputs(PigProcessor.java:291)
	at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:183)
	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:334)
	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
	at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
{code}

Seems the stack is still on Pig side. I am not sure what exactly happens, will update later.

> Ensure the eventRouter thread completes before switching to a new task and thread safety fixes in IPOContexts.
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: TEZ-2426
>                 URL: https://issues.apache.org/jira/browse/TEZ-2426
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Bikas Saha
>            Assignee: Siddharth Seth
>            Priority: Critical
>             Fix For: 0.7.0
>
>         Attachments: TEZ-2426.1.txt, TEZ-2426.2.txt, am.log, container.log
>
>
> Sequence of events
> 1) Task A starts in a container
> 2) Task A complete event comes to AM
> 3) Task B starts in the same container
> 4) Task A's input calls some method on its context. Crashes with NPE
> 5) The crash sends an input failed event for Task A to the AM
> 6) Task A state machine crashes saying cannot handle failed after success
> In some cases, it could be that status update event is also sent after completion, though not sure if its related to the failed event being sent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)