You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Haibo Chen (JIRA)" <ji...@apache.org> on 2016/04/15 18:12:25 UTC

[jira] [Updated] (MAPREDUCE-6675) TestJobImpl.testUnusableNode failed

     [ https://issues.apache.org/jira/browse/MAPREDUCE-6675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Haibo Chen updated MAPREDUCE-6675:
----------------------------------
    Description: 
TestJobImpl#testUnusableNodeTransition is flaky.

2016-02-13 09:16:42 Running org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl
2016-02-13 09:16:50 Tests run: 17, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.324 sec <<< FAILURE! - in org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl
2016-02-13 09:16:50 testUnusableNodeTransition(org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl)  Time elapsed: 5.165 sec  <<< FAILURE!
2016-02-13 09:16:50 java.lang.AssertionError: expected:<SUCCEEDED> but was:<ERROR>
2016-02-13 09:16:50 	at org.junit.Assert.fail(Assert.java:88)
2016-02-13 09:16:50 	at org.junit.Assert.failNotEquals(Assert.java:743)
2016-02-13 09:16:50 	at org.junit.Assert.assertEquals(Assert.java:118)
2016-02-13 09:16:50 	at org.junit.Assert.assertEquals(Assert.java:144)
2016-02-13 09:16:50 	at org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl.assertJobState(TestJobImpl.java:977)
2016-02-13 09:16:50 	at org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl.testUnusableNodeTransition(TestJobImpl.java:627)
2016-02-13 09:16:50 
2016-02-13 09:16:50 
2016-02-13 09:16:50 Results :
2016-02-13 09:16:50 
2016-02-13 09:16:50 Failed tests: 
2016-02-13 09:16:50   TestJobImpl.testUnusableNodeTransition:627->assertJobState:977 expected:<SUCCEEDED> but was:<ERROR>
2016-02-13 09:16:50 
2016-02-13 09:16:50 Tests run: 17, Failures: 1, Errors: 0, Skipped: 0.


Looking at the code, an JobUpdatedNodesEvent is handled by putting an TaskAttemptKill event on the async dispatcher queue and return immediately, but the event might not have been processed by the time  all JobTaskEvents events are seen by the job (the jobTaskSucceeded events are handed to Job immediately without going through the dispatcher). Therefore, there is a slight chance that the job will see all three succeeded attempts and  transition to Committing state before the taskAttemptKill event is handled by the dispatcher. Committing jobs will reject later JobTaskEvents received, transition to InternalError state and cause the test to fail.

  was:
TestJobImpl#testUnusableNodeTransition is flaky.

2016-02-13 09:16:42 Running org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl
2016-02-13 09:16:50 Tests run: 17, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.324 sec <<< FAILURE! - in org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl
2016-02-13 09:16:50 testUnusableNodeTransition(org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl)  Time elapsed: 5.165 sec  <<< FAILURE!
2016-02-13 09:16:50 java.lang.AssertionError: expected:<SUCCEEDED> but was:<ERROR>
2016-02-13 09:16:50 	at org.junit.Assert.fail(Assert.java:88)
2016-02-13 09:16:50 	at org.junit.Assert.failNotEquals(Assert.java:743)
2016-02-13 09:16:50 	at org.junit.Assert.assertEquals(Assert.java:118)
2016-02-13 09:16:50 	at org.junit.Assert.assertEquals(Assert.java:144)
2016-02-13 09:16:50 	at org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl.assertJobState(TestJobImpl.java:977)
2016-02-13 09:16:50 	at org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl.testUnusableNodeTransition(TestJobImpl.java:627)
2016-02-13 09:16:50 
2016-02-13 09:16:50 
2016-02-13 09:16:50 Results :
2016-02-13 09:16:50 
2016-02-13 09:16:50 Failed tests: 
2016-02-13 09:16:50   TestJobImpl.testUnusableNodeTransition:627->assertJobState:977 expected:<SUCCEEDED> but was:<ERROR>
2016-02-13 09:16:50 
2016-02-13 09:16:50 Tests run: 17, Failures: 1, Errors: 0, Skipped: 0.


Looking at the code, an JobUpdatedNodesEvent is handled by putting an TaskAttemptKill event on the async dispatcher queue and return immediately, but the event might not have been processed by the time  all JobTaskEvents events are seen by the job (the jobTaskSucceeded events are handed to Job immediately without going through the dispatcher). Therefore, there is a slight chance that the job will see all three succeeded attempts and  transition to Committing state before the taskAttemptKill event is handled by the dispatcher. Committing jobs will reject later JobTaskEvents received and causing the failure.


> TestJobImpl.testUnusableNode failed 
> ------------------------------------
>
>                 Key: MAPREDUCE-6675
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6675
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 2.7.3
>            Reporter: Haibo Chen
>            Assignee: Haibo Chen
>
> TestJobImpl#testUnusableNodeTransition is flaky.
> 2016-02-13 09:16:42 Running org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl
> 2016-02-13 09:16:50 Tests run: 17, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.324 sec <<< FAILURE! - in org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl
> 2016-02-13 09:16:50 testUnusableNodeTransition(org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl)  Time elapsed: 5.165 sec  <<< FAILURE!
> 2016-02-13 09:16:50 java.lang.AssertionError: expected:<SUCCEEDED> but was:<ERROR>
> 2016-02-13 09:16:50 	at org.junit.Assert.fail(Assert.java:88)
> 2016-02-13 09:16:50 	at org.junit.Assert.failNotEquals(Assert.java:743)
> 2016-02-13 09:16:50 	at org.junit.Assert.assertEquals(Assert.java:118)
> 2016-02-13 09:16:50 	at org.junit.Assert.assertEquals(Assert.java:144)
> 2016-02-13 09:16:50 	at org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl.assertJobState(TestJobImpl.java:977)
> 2016-02-13 09:16:50 	at org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl.testUnusableNodeTransition(TestJobImpl.java:627)
> 2016-02-13 09:16:50 
> 2016-02-13 09:16:50 
> 2016-02-13 09:16:50 Results :
> 2016-02-13 09:16:50 
> 2016-02-13 09:16:50 Failed tests: 
> 2016-02-13 09:16:50   TestJobImpl.testUnusableNodeTransition:627->assertJobState:977 expected:<SUCCEEDED> but was:<ERROR>
> 2016-02-13 09:16:50 
> 2016-02-13 09:16:50 Tests run: 17, Failures: 1, Errors: 0, Skipped: 0.
> Looking at the code, an JobUpdatedNodesEvent is handled by putting an TaskAttemptKill event on the async dispatcher queue and return immediately, but the event might not have been processed by the time  all JobTaskEvents events are seen by the job (the jobTaskSucceeded events are handed to Job immediately without going through the dispatcher). Therefore, there is a slight chance that the job will see all three succeeded attempts and  transition to Committing state before the taskAttemptKill event is handled by the dispatcher. Committing jobs will reject later JobTaskEvents received, transition to InternalError state and cause the test to fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)