You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Bikas Saha (JIRA)" <ji...@apache.org> on 2015/09/12 01:55:45 UTC

[jira] [Comment Edited] (TEZ-2798) NPE when executing TestMemoryWithEvents::testMemoryScatterGather

    [ https://issues.apache.org/jira/browse/TEZ-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741746#comment-14741746 ] 

Bikas Saha edited comment on TEZ-2798 at 9/11/15 11:55 PM:
-----------------------------------------------------------

I investigated this. 

The context passed to containerlaunchercontext is null because its incorrectly passed when the context object is null in the mockdagappmaster constructor. Whenever the launchercontext methods are invoked they NPE on its context member. So when the mockAM launches the mockContainer, there is NPE and the container stays in launching state.

TEZ-2045 reversed the flow of sending taskspec to the communicator. This ends up with the side effect that that container lifecycle becomes disconnected from task lifecycle. Even if the container is in launching state, the rest of the task state machine can proceed because there are no further interactions with the AMcontainer object after that (in the no-error case).

After the task completes, the local scheduler releases the container and the AMcontainer transitions from Launching to stopped. Again it NPEs when the stop() callback is called. But the rest of the AM code/tests pass.

NPE are not crashing the AM because AsyncDispatcher error on exit is set to false. Actually NPE should not be reaching the asyncdispatcher because the containerlaunchermanager should catch exception thrown from service plugin when invoking their methods. In this case, containerlaunchermanager should have caught the exception in plugin.launchContainer() invocation. However, none of the plugin API's actually throw an exception. So the framework code does not catch that exception and we end up ignoring errors. Creating TEZ-2815 to track that.
{code}java.lang.NullPointerException
	at org.apache.tez.dag.app.ContainerLauncherContextImpl.containerLaunched(ContainerLauncherContextImpl.java:47)
	at org.apache.tez.dag.app.MockDAGAppMaster$MockContainerLauncher.launch(MockDAGAppMaster.java:280)
	at org.apache.tez.dag.app.MockDAGAppMaster$MockContainerLauncher.launchContainer(MockDAGAppMaster.java:219)
	at org.apache.tez.dag.app.launcher.ContainerLauncherManager.handle(ContainerLauncherManager.java:200){code}


was (Author: bikassaha):
I investigated this. 

The context passed to containerlaunchercontext is null because its incorrectly passed when the context object is null in the mockdagappmaster constructor. Whenever the launchercontext methods are invoked they NPE on its context member. So when the mockAM launches the mockContainer, there is NPE and the container stays in launching state.

TEZ-2045 reversed the flow of sending taskspec to the communicator. This ends up with the side effect that that container lifecycle becomes disconnected from task lifecycle. Even if the container is in launching state, the rest of the task state machine can proceed because there are no further interactions with the AMcontainer object after that (in the no-error case).

After the task completes, the local scheduler releases the container and the AMcontainer transitions from Launching to stopped. Again it NPEs when the stop() callback is called. But the rest of the AM code/tests pass.

NPE are not crashing the AM because AsyncDispatcher error on exit is set to false. Actually NPE should not be reaching the asyncdispatcher because the containerlaunchermanager should catch exception thrown from service plugin when invoking their methods. In this case, containerlaunchermanager should have caught the exception in plugin.launchContainer() invocation. However, none of the plugin API's actually throw an exception. So the framework code does not catch that exception and we end up ignoring errors. Creating a jira to track that.
{code}java.lang.NullPointerException
	at org.apache.tez.dag.app.ContainerLauncherContextImpl.containerLaunched(ContainerLauncherContextImpl.java:47)
	at org.apache.tez.dag.app.MockDAGAppMaster$MockContainerLauncher.launch(MockDAGAppMaster.java:280)
	at org.apache.tez.dag.app.MockDAGAppMaster$MockContainerLauncher.launchContainer(MockDAGAppMaster.java:219)
	at org.apache.tez.dag.app.launcher.ContainerLauncherManager.handle(ContainerLauncherManager.java:200){code}

> NPE when executing TestMemoryWithEvents::testMemoryScatterGather
> ----------------------------------------------------------------
>
>                 Key: TEZ-2798
>                 URL: https://issues.apache.org/jira/browse/TEZ-2798
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Rajesh Balamohan
>            Priority: Blocker
>             Fix For: 0.8.1
>
>
> {noformat}
> 2015-09-10 05:07:45,885 ERROR [Dispatcher thread: Central] common.AsyncDispatcher (AsyncDispatcher.java:dispatch(188)) - Error in dispatcher thread
> java.lang.NullPointerException
> 	at org.apache.tez.dag.app.ContainerLauncherContextImpl.containerLaunched(ContainerLauncherContextImpl.java:47)
> 	at org.apache.tez.dag.app.MockDAGAppMaster$MockContainerLauncher.launch(MockDAGAppMaster.java:280)
> 	at org.apache.tez.dag.app.MockDAGAppMaster$MockContainerLauncher.launchContainer(MockDAGAppMaster.java:219)
> 	at org.apache.tez.dag.app.launcher.ContainerLauncherManager.handle(ContainerLauncherManager.java:200)
> 	at org.apache.tez.dag.app.launcher.ContainerLauncherManager.handle(ContainerLauncherManager.java:46)
> 	at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
> 	at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
> 	at java.lang.Thread.run(Thread.java:745)
> {noformat}
> Wasn't caught in jenkins as these tests are very long running tests and are marked as @Ignore (mainly for internal testing).
> Same exception with testMemoryBroadcast, testMemoryOneToOne, testMemoryRootInputEvents



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)