You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Jason Lowe (JIRA)" <ji...@apache.org> on 2012/11/11 00:43:12 UTC

[jira] [Created] (MAPREDUCE-4784) TestRecovery occasionally fails

Jason Lowe created MAPREDUCE-4784:
-------------------------------------

             Summary: TestRecovery occasionally fails
                 Key: MAPREDUCE-4784
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4784
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: mrv2, test
    Affects Versions: 2.0.3-alpha
            Reporter: Jason Lowe


TestRecovery is occasionally failing with this error:

{noformat}
testCrashed(org.apache.hadoop.mapreduce.v2.app.TestRecovery): TaskAttempt state is not correct (timedout) expected:<FAILED> but was:<STARTING>
{noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4784) TestRecovery occasionally fails

Posted by "Jason Lowe (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494803#comment-13494803 ] 

Jason Lowe commented on MAPREDUCE-4784:
---------------------------------------

Looking at the test output when it fails, there's an invalid state transition:

{noformat}
2012-11-10 23:19:07,665 INFO  [AsyncDispatcher event handler] impl.TaskAttemptImpl (TaskAttemptImpl.java:handle(993)) - attempt_0_0000_m_000000_1 TaskAttempt Transitioned from NEW to UNASSIGNED
TaskAttempt State is : FAILED
TaskAttempt State is : STARTING Waiting for state : FAILED   progress : 0.0
2012-11-10 23:19:07,667 ERROR [AsyncDispatcher event handler] impl.TaskAttemptImpl (TaskAttemptImpl.java:handle(984)) - Can't handle this event at current state for attempt_0_0000_m_000000_1
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_CONTAINER_LAUNCH_FAILED at UNASSIGNED
        at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
        at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
        at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445)
        at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:982)
        at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1)
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:996)
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1)
        at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:128)
        at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77)
        at java.lang.Thread.run(Thread.java:662)
{noformat}

I think the problem occurs because the test is trying to inject TA_CONTAINER_LAUNCH_FAILED into the attempt state machine asynchronously.  Sometimes that event arrives at the appropriate state and the test passes, sometimes it arrives at an inappropriate state and the test fails.
                
> TestRecovery occasionally fails
> -------------------------------
>
>                 Key: MAPREDUCE-4784
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4784
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, test
>    Affects Versions: 2.0.3-alpha
>            Reporter: Jason Lowe
>
> TestRecovery is occasionally failing with this error:
> {noformat}
> testCrashed(org.apache.hadoop.mapreduce.v2.app.TestRecovery): TaskAttempt state is not correct (timedout) expected:<FAILED> but was:<STARTING>
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira