You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Bikas Saha (JIRA)" <ji...@apache.org> on 2015/09/14 03:31:45 UTC

[jira] [Commented] (TEZ-2816) Build with hadoop 2.4 fails

    [ https://issues.apache.org/jira/browse/TEZ-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14742784#comment-14742784 ] 

Bikas Saha commented on TEZ-2816:
---------------------------------

The core issue is a bug in the scheduler preemption code. Not sure yet, why it does not repro in master.
After a task needs preemption, we wait for 3 heartbeats for the RM to respond with new containers before preempting a different task for this task. The test for preemption checks for this.
However if eventually there is no preemption, then the counter that tracks the last preemption can remain with a stale count. Then, for the next preemption, because of the stale heartbeat count, the next preemption might be triggered immediately instead of waiting for 3 heartbeats.
Fixed the code to make sure the counter is always in sync with the heartbeat counter except during the time when there is a preemption candidate and we need to wait for 3 heartbeats. Fixed test case to verify this.
[~zjffdu] Please review.

> Build with hadoop 2.4 fails
> ---------------------------
>
>                 Key: TEZ-2816
>                 URL: https://issues.apache.org/jira/browse/TEZ-2816
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Jeff Zhang
>            Assignee: Bikas Saha
>         Attachments: TEZ-2816.1.patch
>
>
> https://builds.apache.org/job/Tez-Build-Hadoop-2.4/170/console
> {noformat}
> Running org.apache.tez.analyzer.TestAnalyzer
> Tests run: 13, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 99.595 sec <<< FAILURE!
> testBasicInputFailureWithoutExit(org.apache.tez.analyzer.TestAnalyzer)  Time elapsed: 6.276 sec  <<< FAILURE!
> java.lang.AssertionError: v2 : 000000_0
> 	at org.junit.Assert.fail(Assert.java:88)
> 	at org.junit.Assert.assertTrue(Assert.java:41)
> 	at org.apache.tez.analyzer.TestAnalyzer.verifyCriticalPath(TestAnalyzer.java:273)
> 	at org.apache.tez.analyzer.TestAnalyzer.runDAGAndVerify(TestAnalyzer.java:220)
> 	at org.apache.tez.analyzer.TestAnalyzer.testBasicInputFailureWithoutExit(TestAnalyzer.java:399)
> testCascadingInputFailureWithExitSuccess(org.apache.tez.analyzer.TestAnalyzer)  Time elapsed: 5.986 sec  <<< FAILURE!
> java.lang.AssertionError: v3 : 000000_1
> 	at org.junit.Assert.fail(Assert.java:88)
> 	at org.junit.Assert.assertTrue(Assert.java:41)
> 	at org.apache.tez.analyzer.TestAnalyzer.verifyCriticalPath(TestAnalyzer.java:273)
> 	at org.apache.tez.analyzer.TestAnalyzer.runDAGAndVerify(TestAnalyzer.java:220)
> 	at org.apache.tez.analyzer.TestAnalyzer.testCascadingInputFailureWithExitSuccess(TestAnalyzer.java:561)
> Results :
> Failed tests: 
>   TestAnalyzer.testBasicInputFailureWithoutExit:399->runDAGAndVerify:220->verifyCriticalPath:273 v2 : 000000_0
>   TestAnalyzer.testCascadingInputFailureWithExitSuccess:561->runDAGAndVerify:220->verifyCriticalPath:273 v3 : 000000_1
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)