You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Bikas Saha (JIRA)" <ji...@apache.org> on 2014/09/25 21:05:33 UTC

[jira] [Commented] (TEZ-1624) Flaky tests in TestContainerReuse

    [ https://issues.apache.org/jira/browse/TEZ-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14148151#comment-14148151 ] 

Bikas Saha commented on TEZ-1624:
---------------------------------

Below is the original code with some notes.
Couple of comments.
1) The test notification should be happening when the peek() result is empty and we really are going to wait.
2) The initial peek() at this point seems useless. Could we refactor a bit because we eventually need to do the sync(this) peek() anyways
3) Does the addition of container to delayedContainers need to move inside the sync(this) in addDelayedContainers()?
{code}
        // Try allocating containers which have timed out.
        // Required since these containers may get assigned without
        // locality at this point.
        if (delayedContainers.peek() == null) {
          try {
            // test only signaling to make TestTaskScheduler work
            if (drainedDelayedContainersForTest != null) {              <<<<< THIS SHOULD HAPPEN BEFORE WE REALLY WAIT
              drainedDelayedContainersForTest.set(true);
              synchronized (drainedDelayedContainersForTest) {
                drainedDelayedContainersForTest.notifyAll();
              }
            }
            synchronized(this) {
              this.wait();
            }
            // Re-loop to see if tryAssignAll is set.
            continue;
          } catch (InterruptedException e) {
            LOG.info("AllocatedContainerManager Thread interrupted");
          }
        } else {{code}

This fix should have reduced the sleeps in the test or kept them the same and not increased them :) Could you please revert all the sleep increases and check again? We would like to avoid making tests run longer.

> Flaky tests in TestContainerReuse
> ---------------------------------
>
>                 Key: TEZ-1624
>                 URL: https://issues.apache.org/jira/browse/TEZ-1624
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>         Attachments: TEZ-1624.1.patch
>
>
> Couple of TestContainerReuse tests are failing due to minor race condition in DelayedContainerManager thread.  
> Wanted but not invoked:
> taskSchedulerEventHandlerForTest.taskAllocated(
>     Mock for TaskAttempt, hashCode: 290467934,
>     <any>,
>     Container: [ContainerId: container_1_0001_01_000001, NodeId: host1:0, NodeHttpAddress: host1:0, Resource: <memory:1024, vCores:1>, Priority: 1, Token: null, ]
> );
> -> at org.apache.tez.dag.app.rm.TestContainerReuse.testReuseWithTaskSpecificLaunchCmdOption(TestContainerReuse.java:580)
> However, there were other interactions with this mock:
> -> at org.apache.tez.dag.app.rm.TestContainerReuse.testReuseWithTaskSpecificLaunchCmdOption(TestContainerReuse.java:531)
> -> at org.apache.tez.dag.app.rm.TestContainerReuse.testReuseWithTaskSpecificLaunchCmdOption(TestContainerReuse.java:531)
> -> at org.apache.tez.dag.app.rm.TestContainerReuse.testReuseWithTaskSpecificLaunchCmdOption(TestContainerReuse.java:531)
> -> at org.apache.tez.dag.app.rm.TestContainerReuse.testReuseWithTaskSpecificLaunchCmdOption(TestContainerReuse.java:532)
> -> at org.apache.tez.dag.app.rm.TestContainerReuse.testReuseWithTaskSpecificLaunchCmdOption(TestContainerReuse.java:532)
> -> at org.apache.tez.dag.app.rm.TestContainerReuse.testReuseWithTaskSpecificLaunchCmdOption(TestContainerReuse.java:532)
> -> at org.apache.tez.dag.app.rm.TestContainerReuse.testReuseWithTaskSpecificLaunchCmdOption(TestContainerReuse.java:532)
> -> at org.apache.tez.dag.app.rm.TestContainerReuse.testReuseWithTaskSpecificLaunchCmdOption(TestContainerReuse.java:534)
> -> at org.apache.tez.dag.app.rm.TaskSchedulerAppCallbackWrapper$SetApplicationRegistrationDataCallable.call(TaskSchedulerAppCallbackWrapper.java:244)
> -> at org.apache.tez.dag.app.rm.TestContainerReuse.testReuseWithTaskSpecificLaunchCmdOption(TestContainerReuse.java:570)
> -> at org.apache.tez.dag.app.rm.TestContainerReuse.testReuseWithTaskSpecificLaunchCmdOption(TestContainerReuse.java:571)
> Wanted but not invoked:
> taskSchedulerEventHandlerForTest.taskAllocated(
>     Mock for TaskAttempt, hashCode: 392638651,
>     <any>,
>     Container: [ContainerId: container_1_0001_01_000001, NodeId: host1:0, NodeHttpAddress: host1:0, Resource: <memory:1024, vCores:1>, Priority: 5, Token: null, ]
> );
> -> at org.apache.tez.dag.app.rm.TestContainerReuse.testDelayedReuseContainerNotAvailable(TestContainerReuse.java:333)
> However, there were other interactions with this mock:
> -> at org.apache.tez.dag.app.rm.TestContainerReuse.testDelayedReuseContainerNotAvailable(TestContainerReuse.java:289)
> -> at org.apache.tez.dag.app.rm.TestContainerReuse.testDelayedReuseContainerNotAvailable(TestContainerReuse.java:289)
> -> at org.apache.tez.dag.app.rm.TestContainerReuse.testDelayedReuseContainerNotAvailable(TestContainerReuse.java:289)
> -> at org.apache.tez.dag.app.rm.TestContainerReuse.testDelayedReuseContainerNotAvailable(TestContainerReuse.java:290)
> -> at org.apache.tez.dag.app.rm.TestContainerReuse.testDelayedReuseContainerNotAvailable(TestContainerReuse.java:290)
> -> at org.apache.tez.dag.app.rm.TestContainerReuse.testDelayedReuseContainerNotAvailable(TestContainerReuse.java:290)
> -> at org.apache.tez.dag.app.rm.TestContainerReuse.testDelayedReuseContainerNotAvailable(TestContainerReuse.java:290)
> -> at org.apache.tez.dag.app.rm.TestContainerReuse.testDelayedReuseContainerNotAvailable(TestContainerReuse.java:292)
> -> at org.apache.tez.dag.app.rm.TaskSchedulerAppCallbackWrapper$SetApplicationRegistrationDataCallable.call(TaskSchedulerAppCallbackWrapper.java:244)
> -> at org.apache.tez.dag.app.rm.TestContainerReuse.testDelayedReuseContainerNotAvailable(TestContainerReuse.java:323)
> -> at org.apache.tez.dag.app.rm.TestContainerReuse.testDelayedReuseContainerNotAvailable(TestContainerReuse.java:324)
>         at org.apache.tez.dag.app.rm.TestContainerReuse.testDelayedReuseContainerNotAvailable(TestContainerReuse.java:333)
> org.mockito.exceptions.verification.WantedButNotInvoked:
> Wanted but not invoked:
> taskSchedulerEventHandlerForTest.taskAllocated(
>     Mock for TaskAttempt, hashCode: 1830222901,
>     <any>,
>     Container: [ContainerId: container_1_0001_01_000001, NodeId: host1:0, NodeHttpAddress: host1:0, Resource: <memory:1024, vCores:1>, Priority: 3, Token: null, ]
> );
> -> at org.apache.tez.dag.app.rm.TestContainerReuse.testReuseAcrossVertices(TestContainerReuse.java:909)
> However, there were other interactions with this mock:
> -> at org.apache.tez.dag.app.rm.TestContainerReuse.testReuseAcrossVertices(TestContainerReuse.java:861)
> -> at org.apache.tez.dag.app.rm.TestContainerReuse.testReuseAcrossVertices(TestContainerReuse.java:861)
> -> at org.apache.tez.dag.app.rm.TestContainerReuse.testReuseAcrossVertices(TestContainerReuse.java:861)
> -> at org.apache.tez.dag.app.rm.TestContainerReuse.testReuseAcrossVertices(TestContainerReuse.java:862)
> -> at org.apache.tez.dag.app.rm.TestContainerReuse.testReuseAcrossVertices(TestContainerReuse.java:862)
> -> at org.apache.tez.dag.app.rm.TestContainerReuse.testReuseAcrossVertices(TestContainerReuse.java:862)
> -> at org.apache.tez.dag.app.rm.TestContainerReuse.testReuseAcrossVertices(TestContainerReuse.java:862)
> -> at org.apache.tez.dag.app.rm.TestContainerReuse.testReuseAcrossVertices(TestContainerReuse.java:864)
> -> at org.apache.tez.dag.app.rm.TestContainerReuse.testReuseAcrossVertices(TestContainerReuse.java:900)
> -> at org.apache.tez.dag.app.rm.TaskSchedulerAppCallbackWrapper$SetApplicationRegistrationDataCallable.call(TaskSchedulerAppCallbackWrapper.java:244)
>         at org.apache.tez.dag.app.rm.TestContainerReuse.testReuseAcrossVertices(TestContainerReuse.java:909)
> testDelayedReuseContainerBecomesAvailable(org.apache.tez.dag.app.rm.TestContainerReuse)  Time elapsed: 0.053 sec  <<< FAILURE!
> org.mockito.exceptions.verification.WantedButNotInvoked:
> Wanted but not invoked:
> taskSchedulerEventHandlerForTest.taskAllocated(
>     Mock for TaskAttempt, hashCode: 1829491577,
>     <any>,
>     Container: [ContainerId: container_1_0001_01_000001, NodeId: host1:0, NodeHttpAddress: host1:0, Resource: <memory:1024, vCores:1>, Priority: 5, Token: null, ]
> );
> -> at org.apache.tez.dag.app.rm.TestContainerReuse.testDelayedReuseContainerBecomesAvailable(TestContainerReuse.java:202)
> However, there were other interactions with this mock:
> -> at org.apache.tez.dag.app.rm.TestContainerReuse.testDelayedReuseContainerBecomesAvailable(TestContainerReuse.java:151)
> -> at org.apache.tez.dag.app.rm.TestContainerReuse.testDelayedReuseContainerBecomesAvailable(TestContainerReuse.java:151)
> -> at org.apache.tez.dag.app.rm.TestContainerReuse.testDelayedReuseContainerBecomesAvailable(TestContainerReuse.java:151)
> -> at org.apache.tez.dag.app.rm.TestContainerReuse.testDelayedReuseContainerBecomesAvailable(TestContainerReuse.java:152)
> -> at org.apache.tez.dag.app.rm.TestContainerReuse.testDelayedReuseContainerBecomesAvailable(TestContainerReuse.java:152)
> -> at org.apache.tez.dag.app.rm.TestContainerReuse.testDelayedReuseContainerBecomesAvailable(TestContainerReuse.java:152)
> -> at org.apache.tez.dag.app.rm.TestContainerReuse.testDelayedReuseContainerBecomesAvailable(TestContainerReuse.java:152)
> -> at org.apache.tez.dag.app.rm.TestContainerReuse.testDelayedReuseContainerBecomesAvailable(TestContainerReuse.java:154)
> -> at org.apache.tez.dag.app.rm.TestContainerReuse.testDelayedReuseContainerBecomesAvailable(TestContainerReuse.java:191)
> -> at org.apache.tez.dag.app.rm.TestContainerReuse.testDelayedReuseContainerBecomesAvailable(TestContainerReuse.java:192)
> -> at org.apache.tez.dag.app.rm.TaskSchedulerAppCallbackWrapper$SetApplicationRegistrationDataCallable.call(TaskSchedulerAppCallbackWrapper.java:244)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)