You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@reef.apache.org by "Brian Cho (JIRA)" <ji...@apache.org> on 2015/05/05 17:15:08 UTC
[jira] [Commented] (REEF-291) Sporadic job timeouts in REEF-Tests
[ https://issues.apache.org/jira/browse/REEF-291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14528620#comment-14528620 ]
Brian Cho commented on REEF-291:
--------------------------------
Currently on job timeouts, DefaultClientCloseHandler logs all active stack traces before throwing a RuntimeException. To help us track down threads with deadlocks, I'd like to also log the deadlock information provided by java.lang.management.ThreadMXBean. This is an easier way than {{kill -3}} to get this information on a local machine; In addition Jenkins stores the logs for the latest test in the [workspace|https://builds.apache.org/job/Reef-pull-request-windows/ws/] so this might help when troubleshooting CI builds as well.
Does this sound like a good idea? If so, I'll create an issue for it.
> Sporadic job timeouts in REEF-Tests
> -----------------------------------
>
> Key: REEF-291
> URL: https://issues.apache.org/jira/browse/REEF-291
> Project: REEF
> Issue Type: Bug
> Components: REEF-Tests
> Reporter: Brian Cho
> Assignee: Brian Cho
>
> We've been seeing sporadic reef-tests failures in the Windows CI, e.g. [Windows-265|https://builds.apache.org/job/Reef-pull-request-windows/265/consoleFull]. It appears jobs that should fail right away are instead reaching the one-minute timeout. I am now also seeing this semi-regularly on a Ubuntu box.
> This:
> {code}
> Apr 27, 2015 11:05:31 AM org.apache.reef.client.DriverLauncher run
> WARNING: The Job timed out.
> Apr 27, 2015 11:05:31 AM org.apache.reef.runtime.common.client.RunningJobsImpl closeAllJobs
> WARNING: Force close job FailTaskClose
> {code}
> Leads to this:
> {code}
> Tests run: 7, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 85.547 sec <<< FAILURE! - in org.apache.reef.tests.fail.FailTaskTest
> testFailTaskClose(org.apache.reef.tests.fail.FailTaskTest) Time elapsed: 60.183 sec <<< FAILURE!
> java.lang.AssertionError: expected:<FAILED> but was:<FORCE_CLOSED>
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at org.apache.reef.tests.TestUtils.assertLauncherFailure(TestUtils.java:39)
> at org.apache.reef.tests.fail.FailTaskTest.failOn(FailTaskTest.java:52)
> at org.apache.reef.tests.fail.FailTaskTest.testFailTaskClose(FailTaskTest.java:91)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)