You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@reef.apache.org by "Brian Cho (JIRA)" <ji...@apache.org> on 2015/05/05 17:15:08 UTC

[jira] [Commented] (REEF-291) Sporadic job timeouts in REEF-Tests

    [ https://issues.apache.org/jira/browse/REEF-291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14528620#comment-14528620 ] 

Brian Cho commented on REEF-291:
--------------------------------

Currently on job timeouts, DefaultClientCloseHandler logs all active stack traces before throwing a RuntimeException. To help us track down threads with deadlocks, I'd like to also log the deadlock information provided by java.lang.management.ThreadMXBean. This is an easier way than {{kill -3}} to get this information on a local machine; In addition Jenkins stores the logs for the latest test in the [workspace|https://builds.apache.org/job/Reef-pull-request-windows/ws/] so this might help when troubleshooting CI builds as well.

Does this sound like a good idea? If so, I'll create an issue for it.

> Sporadic job timeouts in REEF-Tests
> -----------------------------------
>
>                 Key: REEF-291
>                 URL: https://issues.apache.org/jira/browse/REEF-291
>             Project: REEF
>          Issue Type: Bug
>          Components: REEF-Tests
>            Reporter: Brian Cho
>            Assignee: Brian Cho
>
> We've been seeing sporadic reef-tests failures in the Windows CI, e.g. [Windows-265|https://builds.apache.org/job/Reef-pull-request-windows/265/consoleFull]. It appears jobs that should fail right away are instead reaching the one-minute timeout. I am now also seeing this semi-regularly on a Ubuntu box.
> This:
> {code}
> Apr 27, 2015 11:05:31 AM org.apache.reef.client.DriverLauncher run
> WARNING: The Job timed out.
> Apr 27, 2015 11:05:31 AM org.apache.reef.runtime.common.client.RunningJobsImpl closeAllJobs
> WARNING: Force close job FailTaskClose
> {code}
> Leads to this:
> {code}
> Tests run: 7, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 85.547 sec <<< FAILURE! - in org.apache.reef.tests.fail.FailTaskTest
> testFailTaskClose(org.apache.reef.tests.fail.FailTaskTest)  Time elapsed: 60.183 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<FAILED> but was:<FORCE_CLOSED>
> 	at org.junit.Assert.fail(Assert.java:88)
> 	at org.junit.Assert.failNotEquals(Assert.java:743)
> 	at org.junit.Assert.assertEquals(Assert.java:118)
> 	at org.junit.Assert.assertEquals(Assert.java:144)
> 	at org.apache.reef.tests.TestUtils.assertLauncherFailure(TestUtils.java:39)
> 	at org.apache.reef.tests.fail.FailTaskTest.failOn(FailTaskTest.java:52)
> 	at org.apache.reef.tests.fail.FailTaskTest.testFailTaskClose(FailTaskTest.java:91)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)