You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by Bo Wang <bo...@cloudera.com> on 2012/09/01 02:12:19 UTC

Re: killApplication doesn't kill AppMaster

Created a JIRA here.

https://issues.apache.org/jira/browse/YARN-76

On Thu, Aug 30, 2012 at 1:43 PM, Bo Wang <bo...@cloudera.com> wrote:

> The calling graph is very useful. Thanks, Vinod.
>
> I traced the code and enabled debugging log. I found one thing interesting
> here.
>
> While running the AM, I "ps aux | grep SampleAM". I found two running
> processes.
>
> 34990
>  /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/bin/java
> SampleAM
> 34984  /bin/bash -c
> /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/bin/java
> SampleAM
> 1>/tmp/logs/application_1346348588670_0011/container_1346348588670_0011_01_000001/stdout
> 2>/tmp/logs/application_1346348588670_0011/container_1346348588670_0011_01_000001/stderr
>
> After killing, in the NM log, I found following.
>
> 12/08/30 13:29:27,542 DEBUG [AsyncDispatcher event handler]
> nodemanager.DefaultContainerExecutor: Sending signal 15 to pid 34984 as
> user bo.wang
> 12/08/30 13:29:27,836 DEBUG [Task killer for 34984]
> nodemanager.DefaultContainerExecutor: Sending signal 9 to pid 34984 as user
> bo.wang
>
> It looks like NM is only killing process 34984, but not 34990. As a
> result, after killing, process 34990 is still running.
>
> Is this a bug? BTW, I am running on my Macbook, which may be the reason
> YARN is using DefaultContainerExecutor rather than LinuxContainerExecutor.
>
> Thanks,
> Bo
>
>
> On Wed, Aug 29, 2012 at 5:23 PM, Vinod Kumar Vavilapalli <
> vinodkv@hortonworks.com> wrote:
>
>>
>> Please attach your jstack dump, may be I can spot something.
>>
>> Pointer for what you asked: ContainerManagerImpl.stopContainer() ->
>> ContainerImpl.KillTransition -> ContainersLauncher ->
>> ContainerLaunch.cleanupContainer(). Follow the events carefully.
>>
>> HTH,
>> +Vinod
>>
>> On Aug 29, 2012, at 3:28 PM, Bo Wang wrote:
>>
>> > Hi Vinod,
>> >
>> > Thanks for the suggestion. I was involved with some other issues before
>> > getting back to this one. Sorry for replying late.
>> >
>> > I tried to kill the process with "kill -3" but it was not interrupted.
>> Then
>> > I used "kill -9" which sent a SIGKILL and the process was killed. I
>> checked
>> > the stderr and used jstack to dump the stack trace. Things look just
>> > normal. Actually, I simplified my test AM to be just an empty while
>> loop.
>> >
>> > I look into the code to find where the SIGKILL is sent in YARN but
>> didn't
>> > find it. I traced down to NodeManager.stopContainer, but didn't see
>> that.
>> > Would you mind sending me a pointer to the actual code?
>> >
>> > Thanks,
>> > Bo
>> >
>> > On Wed, Aug 22, 2012 at 7:29 PM, Vinod Kumar Vavilapalli <
>> > vinodkv@hortonworks.com> wrote:
>> >
>> >>
>> >>> I am not sure when to grab the stack trace of the AM. In the
>> >> stdout/stderr
>> >>> of AM, no stack trace (or exception) is emitted.
>> >>
>> >>
>> >> You can login to the node and if the process is still alive, you can
>> do a
>> >> "kill -3" which will dump the threads' status to stderr.
>> >>
>> >>
>> >>> Btw, I am curious how NM kills a container. Does it directly kill the
>> JVM
>> >>> process?
>> >>
>> >>
>> >> NM directly kills the JVM with a SIGTERM followed by a SIGKILL.
>> >>
>> >> BTW, please also check the corresponding NM's logs if there is some
>> >> exception/error which could mean a bug in NM code.
>> >>
>> >> HTH,
>> >> +Vinod
>>
>>
>