You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Jason Lowe (JIRA)" <ji...@apache.org> on 2018/05/09 18:31:00 UTC

[jira] [Comment Edited] (YARN-8244) TestContainerSchedulerQueuing.testStartMultipleContainers failed

    [ https://issues.apache.org/jira/browse/YARN-8244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469262#comment-16469262 ] 

Jason Lowe edited comment on YARN-8244 at 5/9/18 6:30 PM:
----------------------------------------------------------

Thanks for the patch, Jim!

There are a lot of other tests that are also reusing container launch contexts, e.g.: testQueueMultipleContainers, testStartAndQueueMultipleContainers, etc.  If updating the test is indeed the correct fix then there are many other tests that need to be changed.

Interestingly, I don't see anything in the ContainerLaunchContext that is necessarily specific to a particular container (e.g.: a container ID) that would force the container launch context to have a 1-to-1 mapping to a container instance.  Therefore I think theoretically ContainerLaunchContext could be reused across containers.  However in practice these things are deserialized from protocol buffers, and I don't see how the startContainers method could be invoked with a reused ContainerLaunchContext object for all container start requests.

So we either need to fix all instances in the tests that are sharing container launch context objects across container start requests, or the NM code would need to stop assuming ContainerLaunchContext objects are safe to scribble on if we ever support calling startContainers in such a way where a common launch context could be reused across container start requests.



was (Author: jlowe):
Thanks for the patch, Jim!

There are a lot of other tests that are also reusing container launch contexts, e.g.: testQueueMultipleContainers, testStartAndQueueMultipleContainers, etc.  If fixing the test is indeed the correct fix then there are many other tests that also need to be fixed.

Interestingly, I don't see anything in the ContainerLaunchContext that is necessarily specific to a particular container (e.g.: a container ID) that would force the container launch context to have a 1-to-1 mapping to a container instance.  Therefore I think theoretically ContainerLaunchContext could be reused across containers.  However in practice these things are deserialized from protocol buffers, and I don't see how the startContainers method could be invoked with a reused ContainerLaunchContext object for all container start requests.

So we either need to fix all instances in the tests that are sharing container launch context objects across container start requests, or the NM code would need to stop assuming ContainerLaunchContext objects are safe to scribble on if we ever support calling startContainers in such a way where a common launch context could be reused across container start requests.


>  TestContainerSchedulerQueuing.testStartMultipleContainers failed
> -----------------------------------------------------------------
>
>                 Key: YARN-8244
>                 URL: https://issues.apache.org/jira/browse/YARN-8244
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Miklos Szegedi
>            Assignee: Jim Brennan
>            Priority: Major
>         Attachments: YARN-8244.001.patch
>
>
> {code:java}
> testStartMultipleContainers(org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.TestContainerSchedulerQueuing)  Time elapsed: 22.198 s  <<< FAILURE!
> java.lang.AssertionError: ContainerState is not correct (timedout)
>         at org.junit.Assert.fail(Assert.java:88)
>         at org.junit.Assert.assertTrue(Assert.java:41)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.BaseContainerManagerTest.waitForContainerState(BaseContainerManagerTest.java:344)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.BaseContainerManagerTest.waitForContainerState(BaseContainerManagerTest.java:309)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.TestContainerSchedulerQueuing.testStartMultipleContainers(TestContainerSchedulerQueuing.java:256)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:497)
>         at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>         at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>         at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>         at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>         at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>         at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>         at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>         at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>         at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>         at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>         at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>         at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>         at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>         at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
>         at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>         at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>         at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>         at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>         at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>         at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379)
>         at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340)
>         at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125)
>         at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:413){code}
> {code:java}
> 2018-05-03 17:31:35,028 WARN [ContainersLauncher #1] launcher.ContainerLaunch (ContainerLaunch.java:call(329)) - Failed to launch container.
> java.util.ConcurrentModificationException
> at java.util.HashMap$HashIterator.nextNode(HashMap.java:1437)
> at java.util.HashMap$EntryIterator.next(HashMap.java:1471)
> at java.util.HashMap$EntryIterator.next(HashMap.java:1469)
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch$ShellScriptBuilder.orderEnvByDependencies(ContainerLaunch.java:1311)
> at org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.writeLaunchEnv(ContainerExecutor.java:388)
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:290)
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:101)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org