You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by Krishna Kishore Bonagiri <wr...@gmail.com> on 2013/12/12 12:26:04 UTC

Yarn -- one of the daemons getting killed

Hi,
  I am running a small application on YARN (2.2.0) in a loop of 500 times,
and while doing so one of the daemons, node manager, resource manager, or
data node is getting killed (I mean disappearing) at a random point. I see
no information in the corresponding log files. How can I know why is it
happening so?

 And, one more observation is that, this is happening only when I am using
"*" for node name in the container requests, otherwise when I used a
specific node name, everything is fine.

Thanks,
Kishore

Re: Yarn -- one of the daemons getting killed

Posted by Adam Kawa <ka...@gmail.com>.

If you are interested, please read how we run into OOM-killer issue that
was killing our TaskTrackers
http://hakunamapdata.com/two-memory-related-issues-on-the-apache-hadoop-cluster/
(+
one issue related to heavy swapping).


2013/12/13 Vinod Kumar Vavilapalli <vi...@hortonworks.com>

> Yes, that is what I suspect. That is why I asked if everything is on a
> single node. If you are running linux, linux OOM killer may be shooting
> things down. When it happens, you will see something like "'killed process"
> in system's syslog.
>
> Thanks,
> +Vinod
>
> On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
> Vinod,
>
>   One more thing I observed is that, my Client which submits Application
> Master one after another continuously also gets killed sometimes. So, it is
> always any of the Java Processes that is getting killed. Does it indicate
> some excessive memory usage by them or something like that, that is causing
> them die? If so, how can we resolve this kind of issue?
>
> Thanks,
> Kishore
>
>
> On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>> No, I am running on 2 node cluster.
>>
>>
>> On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli <
>> vinodkv@hortonworks.com> wrote:
>>
>>> Is all of this on a single node?
>>>
>>>  Thanks,
>>> +Vinod
>>>
>>> On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <
>>> write2kishore@gmail.com> wrote:
>>>
>>> Hi,
>>>   I am running a small application on YARN (2.2.0) in a loop of 500
>>> times, and while doing so one of the daemons, node manager, resource
>>> manager, or data node is getting killed (I mean disappearing) at a random
>>> point. I see no information in the corresponding log files. How can I know
>>> why is it happening so?
>>>
>>>  And, one more observation is that, this is happening only when I am
>>> using "*" for node name in the container requests, otherwise when I used a
>>> specific node name, everything is fine.
>>>
>>> Thanks,
>>> Kishore
>>>
>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
>>
>>
>>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Re: Yarn -- one of the daemons getting killed

Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.

Hi Vinay,

  In the out files I could see nothing other than the output of ulimit -all
. Do I need to enable any other kind of logging to get more information?

Thanks,
Kishore


On Mon, Dec 16, 2013 at 5:41 PM, Vinayakumar B <vi...@huawei.com>wrote:

>  Hi Krishna,
>
>
>
> Please check the out files as well for daemons. You may find something.
>
>
>
>
>
> Cheers,
>
> Vinayakumar B
>
>
>
> *From:* Krishna Kishore Bonagiri [mailto:write2kishore@gmail.com]
> *Sent:* 16 December 2013 16:50
> *To:* user@hadoop.apache.org
> *Subject:* Re: Yarn -- one of the daemons getting killed
>
>
>
> Hi Vinod,
>
>
>
>  Yes, I am running on Linux.
>
>
>
>  I was actually searching for a corresponding message in /var/log/messages
> to confirm that OOM killed my daemons, but could not find any corresponding
> messages there! According to the following link, it looks like if it is a
> memory issue, I should see a messages even if OOM is disabled, but I don't
> see it.
>
>
>
> http://www.redhat.com/archives/taroon-list/2007-August/msg00006.html
>
>
>
>   And, is memory consumption more in case of two node cluster than a
> single node one? Also, I see this problem only when I give "*" as the node
> name.
>
>
>
>   One other thing I suspected was the allowed number of user processes, I
> increased that to 31000 from 1024 but that also didn't help.
>
>
>
> Thanks,
>
> Kishore
>
>
>
> On Fri, Dec 13, 2013 at 11:51 PM, Vinod Kumar Vavilapalli <
> vinodkv@hortonworks.com> wrote:
>
> Yes, that is what I suspect. That is why I asked if everything is on a
> single node. If you are running linux, linux OOM killer may be shooting
> things down. When it happens, you will see something like "'killed process"
> in system's syslog.
>
>
>
> Thanks,
>
> +Vinod
>
>
>
> On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>
>
>  Vinod,
>
>
>
>   One more thing I observed is that, my Client which submits Application
> Master one after another continuously also gets killed sometimes. So, it is
> always any of the Java Processes that is getting killed. Does it indicate
> some excessive memory usage by them or something like that, that is causing
> them die? If so, how can we resolve this kind of issue?
>
>
>
> Thanks,
>
> Kishore
>
>
>
> On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
> No, I am running on 2 node cluster.
>
>
>
> On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli <
> vinodkv@hortonworks.com> wrote:
>
> Is all of this on a single node?
>
>
>
> Thanks,
>
> +Vinod
>
>
>
> On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>
>
>  Hi,
>
>   I am running a small application on YARN (2.2.0) in a loop of 500 times,
> and while doing so one of the daemons, node manager, resource manager, or
> data node is getting killed (I mean disappearing) at a random point. I see
> no information in the corresponding log files. How can I know why is it
> happening so?
>
>
>
>  And, one more observation is that, this is happening only when I am using
> "*" for node name in the container requests, otherwise when I used a
> specific node name, everything is fine.
>
>
>
> Thanks,
>
> Kishore
>
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>
>
>
>
>
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>
>
>

Re: Yarn -- one of the daemons getting killed

Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.

Hi Vinay,

  In the out files I could see nothing other than the output of ulimit -all
. Do I need to enable any other kind of logging to get more information?

Thanks,
Kishore


On Mon, Dec 16, 2013 at 5:41 PM, Vinayakumar B <vi...@huawei.com>wrote:

>  Hi Krishna,
>
>
>
> Please check the out files as well for daemons. You may find something.
>
>
>
>
>
> Cheers,
>
> Vinayakumar B
>
>
>
> *From:* Krishna Kishore Bonagiri [mailto:write2kishore@gmail.com]
> *Sent:* 16 December 2013 16:50
> *To:* user@hadoop.apache.org
> *Subject:* Re: Yarn -- one of the daemons getting killed
>
>
>
> Hi Vinod,
>
>
>
>  Yes, I am running on Linux.
>
>
>
>  I was actually searching for a corresponding message in /var/log/messages
> to confirm that OOM killed my daemons, but could not find any corresponding
> messages there! According to the following link, it looks like if it is a
> memory issue, I should see a messages even if OOM is disabled, but I don't
> see it.
>
>
>
> http://www.redhat.com/archives/taroon-list/2007-August/msg00006.html
>
>
>
>   And, is memory consumption more in case of two node cluster than a
> single node one? Also, I see this problem only when I give "*" as the node
> name.
>
>
>
>   One other thing I suspected was the allowed number of user processes, I
> increased that to 31000 from 1024 but that also didn't help.
>
>
>
> Thanks,
>
> Kishore
>
>
>
> On Fri, Dec 13, 2013 at 11:51 PM, Vinod Kumar Vavilapalli <
> vinodkv@hortonworks.com> wrote:
>
> Yes, that is what I suspect. That is why I asked if everything is on a
> single node. If you are running linux, linux OOM killer may be shooting
> things down. When it happens, you will see something like "'killed process"
> in system's syslog.
>
>
>
> Thanks,
>
> +Vinod
>
>
>
> On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>
>
>  Vinod,
>
>
>
>   One more thing I observed is that, my Client which submits Application
> Master one after another continuously also gets killed sometimes. So, it is
> always any of the Java Processes that is getting killed. Does it indicate
> some excessive memory usage by them or something like that, that is causing
> them die? If so, how can we resolve this kind of issue?
>
>
>
> Thanks,
>
> Kishore
>
>
>
> On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
> No, I am running on 2 node cluster.
>
>
>
> On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli <
> vinodkv@hortonworks.com> wrote:
>
> Is all of this on a single node?
>
>
>
> Thanks,
>
> +Vinod
>
>
>
> On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>
>
>  Hi,
>
>   I am running a small application on YARN (2.2.0) in a loop of 500 times,
> and while doing so one of the daemons, node manager, resource manager, or
> data node is getting killed (I mean disappearing) at a random point. I see
> no information in the corresponding log files. How can I know why is it
> happening so?
>
>
>
>  And, one more observation is that, this is happening only when I am using
> "*" for node name in the container requests, otherwise when I used a
> specific node name, everything is fine.
>
>
>
> Thanks,
>
> Kishore
>
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>
>
>
>
>
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>
>
>

Re: Yarn -- one of the daemons getting killed

Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.

Hi Vinay,

  In the out files I could see nothing other than the output of ulimit -all
. Do I need to enable any other kind of logging to get more information?

Thanks,
Kishore


On Mon, Dec 16, 2013 at 5:41 PM, Vinayakumar B <vi...@huawei.com>wrote:

>  Hi Krishna,
>
>
>
> Please check the out files as well for daemons. You may find something.
>
>
>
>
>
> Cheers,
>
> Vinayakumar B
>
>
>
> *From:* Krishna Kishore Bonagiri [mailto:write2kishore@gmail.com]
> *Sent:* 16 December 2013 16:50
> *To:* user@hadoop.apache.org
> *Subject:* Re: Yarn -- one of the daemons getting killed
>
>
>
> Hi Vinod,
>
>
>
>  Yes, I am running on Linux.
>
>
>
>  I was actually searching for a corresponding message in /var/log/messages
> to confirm that OOM killed my daemons, but could not find any corresponding
> messages there! According to the following link, it looks like if it is a
> memory issue, I should see a messages even if OOM is disabled, but I don't
> see it.
>
>
>
> http://www.redhat.com/archives/taroon-list/2007-August/msg00006.html
>
>
>
>   And, is memory consumption more in case of two node cluster than a
> single node one? Also, I see this problem only when I give "*" as the node
> name.
>
>
>
>   One other thing I suspected was the allowed number of user processes, I
> increased that to 31000 from 1024 but that also didn't help.
>
>
>
> Thanks,
>
> Kishore
>
>
>
> On Fri, Dec 13, 2013 at 11:51 PM, Vinod Kumar Vavilapalli <
> vinodkv@hortonworks.com> wrote:
>
> Yes, that is what I suspect. That is why I asked if everything is on a
> single node. If you are running linux, linux OOM killer may be shooting
> things down. When it happens, you will see something like "'killed process"
> in system's syslog.
>
>
>
> Thanks,
>
> +Vinod
>
>
>
> On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>
>
>  Vinod,
>
>
>
>   One more thing I observed is that, my Client which submits Application
> Master one after another continuously also gets killed sometimes. So, it is
> always any of the Java Processes that is getting killed. Does it indicate
> some excessive memory usage by them or something like that, that is causing
> them die? If so, how can we resolve this kind of issue?
>
>
>
> Thanks,
>
> Kishore
>
>
>
> On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
> No, I am running on 2 node cluster.
>
>
>
> On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli <
> vinodkv@hortonworks.com> wrote:
>
> Is all of this on a single node?
>
>
>
> Thanks,
>
> +Vinod
>
>
>
> On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>
>
>  Hi,
>
>   I am running a small application on YARN (2.2.0) in a loop of 500 times,
> and while doing so one of the daemons, node manager, resource manager, or
> data node is getting killed (I mean disappearing) at a random point. I see
> no information in the corresponding log files. How can I know why is it
> happening so?
>
>
>
>  And, one more observation is that, this is happening only when I am using
> "*" for node name in the container requests, otherwise when I used a
> specific node name, everything is fine.
>
>
>
> Thanks,
>
> Kishore
>
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>
>
>
>
>
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>
>
>

Re: Yarn -- one of the daemons getting killed

Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.

Hi Vinay,

  In the out files I could see nothing other than the output of ulimit -all
. Do I need to enable any other kind of logging to get more information?

Thanks,
Kishore


On Mon, Dec 16, 2013 at 5:41 PM, Vinayakumar B <vi...@huawei.com>wrote:

>  Hi Krishna,
>
>
>
> Please check the out files as well for daemons. You may find something.
>
>
>
>
>
> Cheers,
>
> Vinayakumar B
>
>
>
> *From:* Krishna Kishore Bonagiri [mailto:write2kishore@gmail.com]
> *Sent:* 16 December 2013 16:50
> *To:* user@hadoop.apache.org
> *Subject:* Re: Yarn -- one of the daemons getting killed
>
>
>
> Hi Vinod,
>
>
>
>  Yes, I am running on Linux.
>
>
>
>  I was actually searching for a corresponding message in /var/log/messages
> to confirm that OOM killed my daemons, but could not find any corresponding
> messages there! According to the following link, it looks like if it is a
> memory issue, I should see a messages even if OOM is disabled, but I don't
> see it.
>
>
>
> http://www.redhat.com/archives/taroon-list/2007-August/msg00006.html
>
>
>
>   And, is memory consumption more in case of two node cluster than a
> single node one? Also, I see this problem only when I give "*" as the node
> name.
>
>
>
>   One other thing I suspected was the allowed number of user processes, I
> increased that to 31000 from 1024 but that also didn't help.
>
>
>
> Thanks,
>
> Kishore
>
>
>
> On Fri, Dec 13, 2013 at 11:51 PM, Vinod Kumar Vavilapalli <
> vinodkv@hortonworks.com> wrote:
>
> Yes, that is what I suspect. That is why I asked if everything is on a
> single node. If you are running linux, linux OOM killer may be shooting
> things down. When it happens, you will see something like "'killed process"
> in system's syslog.
>
>
>
> Thanks,
>
> +Vinod
>
>
>
> On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>
>
>  Vinod,
>
>
>
>   One more thing I observed is that, my Client which submits Application
> Master one after another continuously also gets killed sometimes. So, it is
> always any of the Java Processes that is getting killed. Does it indicate
> some excessive memory usage by them or something like that, that is causing
> them die? If so, how can we resolve this kind of issue?
>
>
>
> Thanks,
>
> Kishore
>
>
>
> On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
> No, I am running on 2 node cluster.
>
>
>
> On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli <
> vinodkv@hortonworks.com> wrote:
>
> Is all of this on a single node?
>
>
>
> Thanks,
>
> +Vinod
>
>
>
> On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>
>
>  Hi,
>
>   I am running a small application on YARN (2.2.0) in a loop of 500 times,
> and while doing so one of the daemons, node manager, resource manager, or
> data node is getting killed (I mean disappearing) at a random point. I see
> no information in the corresponding log files. How can I know why is it
> happening so?
>
>
>
>  And, one more observation is that, this is happening only when I am using
> "*" for node name in the container requests, otherwise when I used a
> specific node name, everything is fine.
>
>
>
> Thanks,
>
> Kishore
>
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>
>
>
>
>
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>
>
>

RE: Yarn -- one of the daemons getting killed

Posted by Vinayakumar B <vi...@huawei.com>.

Hi Krishna,

Please check the out files as well for daemons. You may find something.


Cheers,
Vinayakumar B

From: Krishna Kishore Bonagiri [mailto:write2kishore@gmail.com]
Sent: 16 December 2013 16:50
To: user@hadoop.apache.org
Subject: Re: Yarn -- one of the daemons getting killed

Hi Vinod,

 Yes, I am running on Linux.

 I was actually searching for a corresponding message in /var/log/messages to confirm that OOM killed my daemons, but could not find any corresponding messages there! According to the following link, it looks like if it is a memory issue, I should see a messages even if OOM is disabled, but I don't see it.

http://www.redhat.com/archives/taroon-list/2007-August/msg00006.html

  And, is memory consumption more in case of two node cluster than a single node one? Also, I see this problem only when I give "*" as the node name.

  One other thing I suspected was the allowed number of user processes, I increased that to 31000 from 1024 but that also didn't help.

Thanks,
Kishore

On Fri, Dec 13, 2013 at 11:51 PM, Vinod Kumar Vavilapalli <vi...@hortonworks.com>> wrote:
Yes, that is what I suspect. That is why I asked if everything is on a single node. If you are running linux, linux OOM killer may be shooting things down. When it happens, you will see something like "'killed process" in system's syslog.

Thanks,
+Vinod

On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri <wr...@gmail.com>> wrote:


Vinod,

  One more thing I observed is that, my Client which submits Application Master one after another continuously also gets killed sometimes. So, it is always any of the Java Processes that is getting killed. Does it indicate some excessive memory usage by them or something like that, that is causing them die? If so, how can we resolve this kind of issue?

Thanks,
Kishore

On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri <wr...@gmail.com>> wrote:
No, I am running on 2 node cluster.

On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli <vi...@hortonworks.com>> wrote:
Is all of this on a single node?

Thanks,
+Vinod

On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <wr...@gmail.com>> wrote:


Hi,
  I am running a small application on YARN (2.2.0) in a loop of 500 times, and while doing so one of the daemons, node manager, resource manager, or data node is getting killed (I mean disappearing) at a random point. I see no information in the corresponding log files. How can I know why is it happening so?

 And, one more observation is that, this is happening only when I am using "*" for node name in the container requests, otherwise when I used a specific node name, everything is fine.

Thanks,
Kishore


CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.




CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

RE: Yarn -- one of the daemons getting killed

Posted by Vinayakumar B <vi...@huawei.com>.

Hi Krishna,

Please check the out files as well for daemons. You may find something.


Cheers,
Vinayakumar B

From: Krishna Kishore Bonagiri [mailto:write2kishore@gmail.com]
Sent: 16 December 2013 16:50
To: user@hadoop.apache.org
Subject: Re: Yarn -- one of the daemons getting killed

Hi Vinod,

 Yes, I am running on Linux.

 I was actually searching for a corresponding message in /var/log/messages to confirm that OOM killed my daemons, but could not find any corresponding messages there! According to the following link, it looks like if it is a memory issue, I should see a messages even if OOM is disabled, but I don't see it.

http://www.redhat.com/archives/taroon-list/2007-August/msg00006.html

  And, is memory consumption more in case of two node cluster than a single node one? Also, I see this problem only when I give "*" as the node name.

  One other thing I suspected was the allowed number of user processes, I increased that to 31000 from 1024 but that also didn't help.

Thanks,
Kishore

On Fri, Dec 13, 2013 at 11:51 PM, Vinod Kumar Vavilapalli <vi...@hortonworks.com>> wrote:
Yes, that is what I suspect. That is why I asked if everything is on a single node. If you are running linux, linux OOM killer may be shooting things down. When it happens, you will see something like "'killed process" in system's syslog.

Thanks,
+Vinod

On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri <wr...@gmail.com>> wrote:


Vinod,

  One more thing I observed is that, my Client which submits Application Master one after another continuously also gets killed sometimes. So, it is always any of the Java Processes that is getting killed. Does it indicate some excessive memory usage by them or something like that, that is causing them die? If so, how can we resolve this kind of issue?

Thanks,
Kishore

On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri <wr...@gmail.com>> wrote:
No, I am running on 2 node cluster.

On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli <vi...@hortonworks.com>> wrote:
Is all of this on a single node?

Thanks,
+Vinod

On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <wr...@gmail.com>> wrote:


Hi,
  I am running a small application on YARN (2.2.0) in a loop of 500 times, and while doing so one of the daemons, node manager, resource manager, or data node is getting killed (I mean disappearing) at a random point. I see no information in the corresponding log files. How can I know why is it happening so?

 And, one more observation is that, this is happening only when I am using "*" for node name in the container requests, otherwise when I used a specific node name, everything is fine.

Thanks,
Kishore


CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.




CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

RE: Yarn -- one of the daemons getting killed

Posted by Vinayakumar B <vi...@huawei.com>.

Hi Krishna,

Please check the out files as well for daemons. You may find something.


Cheers,
Vinayakumar B

From: Krishna Kishore Bonagiri [mailto:write2kishore@gmail.com]
Sent: 16 December 2013 16:50
To: user@hadoop.apache.org
Subject: Re: Yarn -- one of the daemons getting killed

Hi Vinod,

 Yes, I am running on Linux.

 I was actually searching for a corresponding message in /var/log/messages to confirm that OOM killed my daemons, but could not find any corresponding messages there! According to the following link, it looks like if it is a memory issue, I should see a messages even if OOM is disabled, but I don't see it.

http://www.redhat.com/archives/taroon-list/2007-August/msg00006.html

  And, is memory consumption more in case of two node cluster than a single node one? Also, I see this problem only when I give "*" as the node name.

  One other thing I suspected was the allowed number of user processes, I increased that to 31000 from 1024 but that also didn't help.

Thanks,
Kishore

On Fri, Dec 13, 2013 at 11:51 PM, Vinod Kumar Vavilapalli <vi...@hortonworks.com>> wrote:
Yes, that is what I suspect. That is why I asked if everything is on a single node. If you are running linux, linux OOM killer may be shooting things down. When it happens, you will see something like "'killed process" in system's syslog.

Thanks,
+Vinod

On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri <wr...@gmail.com>> wrote:


Vinod,

  One more thing I observed is that, my Client which submits Application Master one after another continuously also gets killed sometimes. So, it is always any of the Java Processes that is getting killed. Does it indicate some excessive memory usage by them or something like that, that is causing them die? If so, how can we resolve this kind of issue?

Thanks,
Kishore

On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri <wr...@gmail.com>> wrote:
No, I am running on 2 node cluster.

On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli <vi...@hortonworks.com>> wrote:
Is all of this on a single node?

Thanks,
+Vinod

On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <wr...@gmail.com>> wrote:


Hi,
  I am running a small application on YARN (2.2.0) in a loop of 500 times, and while doing so one of the daemons, node manager, resource manager, or data node is getting killed (I mean disappearing) at a random point. I see no information in the corresponding log files. How can I know why is it happening so?

 And, one more observation is that, this is happening only when I am using "*" for node name in the container requests, otherwise when I used a specific node name, everything is fine.

Thanks,
Kishore


CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.




CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

Re: Yarn -- one of the daemons getting killed

Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.

Hi Vinod,
 Thanks for the link, I went through it and it looks like the OOM killer
picks a process that has the highest oom_score. I have tried to capture
oom_score for all the YARN daemon processes after each run of my
application.The first time I have captured these details, I see that the
name node is killed where as the Node Manager has the highest score. So, I
don't if it is really the OOM killer that has killed it!

 Please see the output of my run attached, which also has the output of
free command after each run. The output of free command doesn't either show
any exhaustion of system memory.

Also, one more thing I have done today is, I have added audit rules for
each of the daemons to capture all the system calls. And, in the audit log,
I see futex() system call occurring in the killed daemon processes. I don't
know if it causes the daemon to die? and why does that call happen...


Thanks,
Kishore


On Wed, Dec 18, 2013 at 12:31 AM, Vinod Kumar Vavilapalli <
vinodkv@hortonworks.com> wrote:

> That's good info. It is more than likely that it is the OOM killer. See
> http://stackoverflow.com/questions/726690/who-killed-my-process-and-whyfor example.
>
> Thanks,
> +Vinod
>
> On Dec 17, 2013, at 1:26 AM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
> Hi Jeff,
>
>   I have run the resource manager in the foreground without nohup and here
> are the messages when it was killed, it says it is "Killed" but doesn't say
> why!
>
> 13/12/17 03:14:54 INFO capacity.CapacityScheduler: Application
> appattempt_1387266015651_0258_000001 released container
> container_1387266015651_0258_01_000003 on node: host: isredeng:36576
> #containers=2 available=7936 used=256 with event: FINISHED
> 13/12/17 03:14:54 INFO rmcontainer.RMContainerImpl:
> container_1387266015651_0258_01_000005 Container Transitioned from ACQUIRED
> to RUNNING
> Killed
>
>
> Thanks,
> Kishore
>
>
> On Mon, Dec 16, 2013 at 11:10 PM, Jeff Stuckman <st...@umd.edu> wrote:
>
>>  What if you open the daemons in a "screen" session rather than running
>> them in the background -- for example, run "yarn resourcemanager". Then you
>> can see exactly when they terminate, and hopefully why.
>>
>>    *From: *Krishna Kishore Bonagiri
>> *Sent: *Monday, December 16, 2013 6:20 AM
>> *To: *user@hadoop.apache.org
>> *Reply To: *user@hadoop.apache.org
>> *Subject: *Re: Yarn -- one of the daemons getting killed
>>
>>  Hi Vinod,
>>
>>   Yes, I am running on Linux.
>>
>>  I was actually searching for a corresponding message in
>> /var/log/messages to confirm that OOM killed my daemons, but could not find
>> any corresponding messages there! According to the following link, it looks
>> like if it is a memory issue, I should see a messages even if OOM is
>> disabled, but I don't see it.
>>
>>  http://www.redhat.com/archives/taroon-list/2007-August/msg00006.html
>>
>>    And, is memory consumption more in case of two node cluster than a
>> single node one? Also, I see this problem only when I give "*" as the node
>> name.
>>
>>    One other thing I suspected was the allowed number of user processes,
>> I increased that to 31000 from 1024 but that also didn't help.
>>
>>  Thanks,
>> Kishore
>>
>>
>> On Fri, Dec 13, 2013 at 11:51 PM, Vinod Kumar Vavilapalli <
>> vinodkv@hortonworks.com> wrote:
>>
>>> Yes, that is what I suspect. That is why I asked if everything is on a
>>> single node. If you are running linux, linux OOM killer may be shooting
>>> things down. When it happens, you will see something like "'killed process"
>>> in system's syslog.
>>>
>>>    Thanks,
>>> +Vinod
>>>
>>>  On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri <
>>> write2kishore@gmail.com> wrote:
>>>
>>>  Vinod,
>>>
>>>   One more thing I observed is that, my Client which submits Application
>>> Master one after another continuously also gets killed sometimes. So, it is
>>> always any of the Java Processes that is getting killed. Does it indicate
>>> some excessive memory usage by them or something like that, that is causing
>>> them die? If so, how can we resolve this kind of issue?
>>>
>>>  Thanks,
>>> Kishore
>>>
>>>
>>> On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri <
>>> write2kishore@gmail.com> wrote:
>>>
>>>> No, I am running on 2 node cluster.
>>>>
>>>>
>>>> On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli <
>>>> vinodkv@hortonworks.com> wrote:
>>>>
>>>>> Is all of this on a single node?
>>>>>
>>>>>   Thanks,
>>>>> +Vinod
>>>>>
>>>>>  On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <
>>>>> write2kishore@gmail.com> wrote:
>>>>>
>>>>>  Hi,
>>>>>   I am running a small application on YARN (2.2.0) in a loop of 500
>>>>> times, and while doing so one of the daemons, node manager, resource
>>>>> manager, or data node is getting killed (I mean disappearing) at a random
>>>>> point. I see no information in the corresponding log files. How can I know
>>>>> why is it happening so?
>>>>>
>>>>>   And, one more observation is that, this is happening only when I am
>>>>> using "*" for node name in the container requests, otherwise when I used a
>>>>> specific node name, everything is fine.
>>>>>
>>>>>  Thanks,
>>>>> Kishore
>>>>>
>>>>>
>>>>>
>>>>> CONFIDENTIALITY NOTICE
>>>>> NOTICE: This message is intended for the use of the individual or
>>>>> entity to which it is addressed and may contain information that is
>>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>>> If the reader of this message is not the intended recipient, you are hereby
>>>>> notified that any printing, copying, dissemination, distribution,
>>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>>> you have received this communication in error, please contact the sender
>>>>> immediately and delete it from your system. Thank You.
>>>>
>>>>
>>>>
>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
>>>
>>
>>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Re: Yarn -- one of the daemons getting killed

Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.

Hi Vinod,
 Thanks for the link, I went through it and it looks like the OOM killer
picks a process that has the highest oom_score. I have tried to capture
oom_score for all the YARN daemon processes after each run of my
application.The first time I have captured these details, I see that the
name node is killed where as the Node Manager has the highest score. So, I
don't if it is really the OOM killer that has killed it!

 Please see the output of my run attached, which also has the output of
free command after each run. The output of free command doesn't either show
any exhaustion of system memory.

Also, one more thing I have done today is, I have added audit rules for
each of the daemons to capture all the system calls. And, in the audit log,
I see futex() system call occurring in the killed daemon processes. I don't
know if it causes the daemon to die? and why does that call happen...


Thanks,
Kishore


On Wed, Dec 18, 2013 at 12:31 AM, Vinod Kumar Vavilapalli <
vinodkv@hortonworks.com> wrote:

> That's good info. It is more than likely that it is the OOM killer. See
> http://stackoverflow.com/questions/726690/who-killed-my-process-and-whyfor example.
>
> Thanks,
> +Vinod
>
> On Dec 17, 2013, at 1:26 AM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
> Hi Jeff,
>
>   I have run the resource manager in the foreground without nohup and here
> are the messages when it was killed, it says it is "Killed" but doesn't say
> why!
>
> 13/12/17 03:14:54 INFO capacity.CapacityScheduler: Application
> appattempt_1387266015651_0258_000001 released container
> container_1387266015651_0258_01_000003 on node: host: isredeng:36576
> #containers=2 available=7936 used=256 with event: FINISHED
> 13/12/17 03:14:54 INFO rmcontainer.RMContainerImpl:
> container_1387266015651_0258_01_000005 Container Transitioned from ACQUIRED
> to RUNNING
> Killed
>
>
> Thanks,
> Kishore
>
>
> On Mon, Dec 16, 2013 at 11:10 PM, Jeff Stuckman <st...@umd.edu> wrote:
>
>>  What if you open the daemons in a "screen" session rather than running
>> them in the background -- for example, run "yarn resourcemanager". Then you
>> can see exactly when they terminate, and hopefully why.
>>
>>    *From: *Krishna Kishore Bonagiri
>> *Sent: *Monday, December 16, 2013 6:20 AM
>> *To: *user@hadoop.apache.org
>> *Reply To: *user@hadoop.apache.org
>> *Subject: *Re: Yarn -- one of the daemons getting killed
>>
>>  Hi Vinod,
>>
>>   Yes, I am running on Linux.
>>
>>  I was actually searching for a corresponding message in
>> /var/log/messages to confirm that OOM killed my daemons, but could not find
>> any corresponding messages there! According to the following link, it looks
>> like if it is a memory issue, I should see a messages even if OOM is
>> disabled, but I don't see it.
>>
>>  http://www.redhat.com/archives/taroon-list/2007-August/msg00006.html
>>
>>    And, is memory consumption more in case of two node cluster than a
>> single node one? Also, I see this problem only when I give "*" as the node
>> name.
>>
>>    One other thing I suspected was the allowed number of user processes,
>> I increased that to 31000 from 1024 but that also didn't help.
>>
>>  Thanks,
>> Kishore
>>
>>
>> On Fri, Dec 13, 2013 at 11:51 PM, Vinod Kumar Vavilapalli <
>> vinodkv@hortonworks.com> wrote:
>>
>>> Yes, that is what I suspect. That is why I asked if everything is on a
>>> single node. If you are running linux, linux OOM killer may be shooting
>>> things down. When it happens, you will see something like "'killed process"
>>> in system's syslog.
>>>
>>>    Thanks,
>>> +Vinod
>>>
>>>  On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri <
>>> write2kishore@gmail.com> wrote:
>>>
>>>  Vinod,
>>>
>>>   One more thing I observed is that, my Client which submits Application
>>> Master one after another continuously also gets killed sometimes. So, it is
>>> always any of the Java Processes that is getting killed. Does it indicate
>>> some excessive memory usage by them or something like that, that is causing
>>> them die? If so, how can we resolve this kind of issue?
>>>
>>>  Thanks,
>>> Kishore
>>>
>>>
>>> On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri <
>>> write2kishore@gmail.com> wrote:
>>>
>>>> No, I am running on 2 node cluster.
>>>>
>>>>
>>>> On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli <
>>>> vinodkv@hortonworks.com> wrote:
>>>>
>>>>> Is all of this on a single node?
>>>>>
>>>>>   Thanks,
>>>>> +Vinod
>>>>>
>>>>>  On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <
>>>>> write2kishore@gmail.com> wrote:
>>>>>
>>>>>  Hi,
>>>>>   I am running a small application on YARN (2.2.0) in a loop of 500
>>>>> times, and while doing so one of the daemons, node manager, resource
>>>>> manager, or data node is getting killed (I mean disappearing) at a random
>>>>> point. I see no information in the corresponding log files. How can I know
>>>>> why is it happening so?
>>>>>
>>>>>   And, one more observation is that, this is happening only when I am
>>>>> using "*" for node name in the container requests, otherwise when I used a
>>>>> specific node name, everything is fine.
>>>>>
>>>>>  Thanks,
>>>>> Kishore
>>>>>
>>>>>
>>>>>
>>>>> CONFIDENTIALITY NOTICE
>>>>> NOTICE: This message is intended for the use of the individual or
>>>>> entity to which it is addressed and may contain information that is
>>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>>> If the reader of this message is not the intended recipient, you are hereby
>>>>> notified that any printing, copying, dissemination, distribution,
>>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>>> you have received this communication in error, please contact the sender
>>>>> immediately and delete it from your system. Thank You.
>>>>
>>>>
>>>>
>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
>>>
>>
>>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Re: Yarn -- one of the daemons getting killed

Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.

Hi Vinod,
 Thanks for the link, I went through it and it looks like the OOM killer
picks a process that has the highest oom_score. I have tried to capture
oom_score for all the YARN daemon processes after each run of my
application.The first time I have captured these details, I see that the
name node is killed where as the Node Manager has the highest score. So, I
don't if it is really the OOM killer that has killed it!

 Please see the output of my run attached, which also has the output of
free command after each run. The output of free command doesn't either show
any exhaustion of system memory.

Also, one more thing I have done today is, I have added audit rules for
each of the daemons to capture all the system calls. And, in the audit log,
I see futex() system call occurring in the killed daemon processes. I don't
know if it causes the daemon to die? and why does that call happen...


Thanks,
Kishore


On Wed, Dec 18, 2013 at 12:31 AM, Vinod Kumar Vavilapalli <
vinodkv@hortonworks.com> wrote:

> That's good info. It is more than likely that it is the OOM killer. See
> http://stackoverflow.com/questions/726690/who-killed-my-process-and-whyfor example.
>
> Thanks,
> +Vinod
>
> On Dec 17, 2013, at 1:26 AM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
> Hi Jeff,
>
>   I have run the resource manager in the foreground without nohup and here
> are the messages when it was killed, it says it is "Killed" but doesn't say
> why!
>
> 13/12/17 03:14:54 INFO capacity.CapacityScheduler: Application
> appattempt_1387266015651_0258_000001 released container
> container_1387266015651_0258_01_000003 on node: host: isredeng:36576
> #containers=2 available=7936 used=256 with event: FINISHED
> 13/12/17 03:14:54 INFO rmcontainer.RMContainerImpl:
> container_1387266015651_0258_01_000005 Container Transitioned from ACQUIRED
> to RUNNING
> Killed
>
>
> Thanks,
> Kishore
>
>
> On Mon, Dec 16, 2013 at 11:10 PM, Jeff Stuckman <st...@umd.edu> wrote:
>
>>  What if you open the daemons in a "screen" session rather than running
>> them in the background -- for example, run "yarn resourcemanager". Then you
>> can see exactly when they terminate, and hopefully why.
>>
>>    *From: *Krishna Kishore Bonagiri
>> *Sent: *Monday, December 16, 2013 6:20 AM
>> *To: *user@hadoop.apache.org
>> *Reply To: *user@hadoop.apache.org
>> *Subject: *Re: Yarn -- one of the daemons getting killed
>>
>>  Hi Vinod,
>>
>>   Yes, I am running on Linux.
>>
>>  I was actually searching for a corresponding message in
>> /var/log/messages to confirm that OOM killed my daemons, but could not find
>> any corresponding messages there! According to the following link, it looks
>> like if it is a memory issue, I should see a messages even if OOM is
>> disabled, but I don't see it.
>>
>>  http://www.redhat.com/archives/taroon-list/2007-August/msg00006.html
>>
>>    And, is memory consumption more in case of two node cluster than a
>> single node one? Also, I see this problem only when I give "*" as the node
>> name.
>>
>>    One other thing I suspected was the allowed number of user processes,
>> I increased that to 31000 from 1024 but that also didn't help.
>>
>>  Thanks,
>> Kishore
>>
>>
>> On Fri, Dec 13, 2013 at 11:51 PM, Vinod Kumar Vavilapalli <
>> vinodkv@hortonworks.com> wrote:
>>
>>> Yes, that is what I suspect. That is why I asked if everything is on a
>>> single node. If you are running linux, linux OOM killer may be shooting
>>> things down. When it happens, you will see something like "'killed process"
>>> in system's syslog.
>>>
>>>    Thanks,
>>> +Vinod
>>>
>>>  On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri <
>>> write2kishore@gmail.com> wrote:
>>>
>>>  Vinod,
>>>
>>>   One more thing I observed is that, my Client which submits Application
>>> Master one after another continuously also gets killed sometimes. So, it is
>>> always any of the Java Processes that is getting killed. Does it indicate
>>> some excessive memory usage by them or something like that, that is causing
>>> them die? If so, how can we resolve this kind of issue?
>>>
>>>  Thanks,
>>> Kishore
>>>
>>>
>>> On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri <
>>> write2kishore@gmail.com> wrote:
>>>
>>>> No, I am running on 2 node cluster.
>>>>
>>>>
>>>> On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli <
>>>> vinodkv@hortonworks.com> wrote:
>>>>
>>>>> Is all of this on a single node?
>>>>>
>>>>>   Thanks,
>>>>> +Vinod
>>>>>
>>>>>  On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <
>>>>> write2kishore@gmail.com> wrote:
>>>>>
>>>>>  Hi,
>>>>>   I am running a small application on YARN (2.2.0) in a loop of 500
>>>>> times, and while doing so one of the daemons, node manager, resource
>>>>> manager, or data node is getting killed (I mean disappearing) at a random
>>>>> point. I see no information in the corresponding log files. How can I know
>>>>> why is it happening so?
>>>>>
>>>>>   And, one more observation is that, this is happening only when I am
>>>>> using "*" for node name in the container requests, otherwise when I used a
>>>>> specific node name, everything is fine.
>>>>>
>>>>>  Thanks,
>>>>> Kishore
>>>>>
>>>>>
>>>>>
>>>>> CONFIDENTIALITY NOTICE
>>>>> NOTICE: This message is intended for the use of the individual or
>>>>> entity to which it is addressed and may contain information that is
>>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>>> If the reader of this message is not the intended recipient, you are hereby
>>>>> notified that any printing, copying, dissemination, distribution,
>>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>>> you have received this communication in error, please contact the sender
>>>>> immediately and delete it from your system. Thank You.
>>>>
>>>>
>>>>
>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
>>>
>>
>>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Re: Yarn -- one of the daemons getting killed

Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.

Hi Vinod,
 Thanks for the link, I went through it and it looks like the OOM killer
picks a process that has the highest oom_score. I have tried to capture
oom_score for all the YARN daemon processes after each run of my
application.The first time I have captured these details, I see that the
name node is killed where as the Node Manager has the highest score. So, I
don't if it is really the OOM killer that has killed it!

 Please see the output of my run attached, which also has the output of
free command after each run. The output of free command doesn't either show
any exhaustion of system memory.

Also, one more thing I have done today is, I have added audit rules for
each of the daemons to capture all the system calls. And, in the audit log,
I see futex() system call occurring in the killed daemon processes. I don't
know if it causes the daemon to die? and why does that call happen...


Thanks,
Kishore


On Wed, Dec 18, 2013 at 12:31 AM, Vinod Kumar Vavilapalli <
vinodkv@hortonworks.com> wrote:

> That's good info. It is more than likely that it is the OOM killer. See
> http://stackoverflow.com/questions/726690/who-killed-my-process-and-whyfor example.
>
> Thanks,
> +Vinod
>
> On Dec 17, 2013, at 1:26 AM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
> Hi Jeff,
>
>   I have run the resource manager in the foreground without nohup and here
> are the messages when it was killed, it says it is "Killed" but doesn't say
> why!
>
> 13/12/17 03:14:54 INFO capacity.CapacityScheduler: Application
> appattempt_1387266015651_0258_000001 released container
> container_1387266015651_0258_01_000003 on node: host: isredeng:36576
> #containers=2 available=7936 used=256 with event: FINISHED
> 13/12/17 03:14:54 INFO rmcontainer.RMContainerImpl:
> container_1387266015651_0258_01_000005 Container Transitioned from ACQUIRED
> to RUNNING
> Killed
>
>
> Thanks,
> Kishore
>
>
> On Mon, Dec 16, 2013 at 11:10 PM, Jeff Stuckman <st...@umd.edu> wrote:
>
>>  What if you open the daemons in a "screen" session rather than running
>> them in the background -- for example, run "yarn resourcemanager". Then you
>> can see exactly when they terminate, and hopefully why.
>>
>>    *From: *Krishna Kishore Bonagiri
>> *Sent: *Monday, December 16, 2013 6:20 AM
>> *To: *user@hadoop.apache.org
>> *Reply To: *user@hadoop.apache.org
>> *Subject: *Re: Yarn -- one of the daemons getting killed
>>
>>  Hi Vinod,
>>
>>   Yes, I am running on Linux.
>>
>>  I was actually searching for a corresponding message in
>> /var/log/messages to confirm that OOM killed my daemons, but could not find
>> any corresponding messages there! According to the following link, it looks
>> like if it is a memory issue, I should see a messages even if OOM is
>> disabled, but I don't see it.
>>
>>  http://www.redhat.com/archives/taroon-list/2007-August/msg00006.html
>>
>>    And, is memory consumption more in case of two node cluster than a
>> single node one? Also, I see this problem only when I give "*" as the node
>> name.
>>
>>    One other thing I suspected was the allowed number of user processes,
>> I increased that to 31000 from 1024 but that also didn't help.
>>
>>  Thanks,
>> Kishore
>>
>>
>> On Fri, Dec 13, 2013 at 11:51 PM, Vinod Kumar Vavilapalli <
>> vinodkv@hortonworks.com> wrote:
>>
>>> Yes, that is what I suspect. That is why I asked if everything is on a
>>> single node. If you are running linux, linux OOM killer may be shooting
>>> things down. When it happens, you will see something like "'killed process"
>>> in system's syslog.
>>>
>>>    Thanks,
>>> +Vinod
>>>
>>>  On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri <
>>> write2kishore@gmail.com> wrote:
>>>
>>>  Vinod,
>>>
>>>   One more thing I observed is that, my Client which submits Application
>>> Master one after another continuously also gets killed sometimes. So, it is
>>> always any of the Java Processes that is getting killed. Does it indicate
>>> some excessive memory usage by them or something like that, that is causing
>>> them die? If so, how can we resolve this kind of issue?
>>>
>>>  Thanks,
>>> Kishore
>>>
>>>
>>> On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri <
>>> write2kishore@gmail.com> wrote:
>>>
>>>> No, I am running on 2 node cluster.
>>>>
>>>>
>>>> On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli <
>>>> vinodkv@hortonworks.com> wrote:
>>>>
>>>>> Is all of this on a single node?
>>>>>
>>>>>   Thanks,
>>>>> +Vinod
>>>>>
>>>>>  On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <
>>>>> write2kishore@gmail.com> wrote:
>>>>>
>>>>>  Hi,
>>>>>   I am running a small application on YARN (2.2.0) in a loop of 500
>>>>> times, and while doing so one of the daemons, node manager, resource
>>>>> manager, or data node is getting killed (I mean disappearing) at a random
>>>>> point. I see no information in the corresponding log files. How can I know
>>>>> why is it happening so?
>>>>>
>>>>>   And, one more observation is that, this is happening only when I am
>>>>> using "*" for node name in the container requests, otherwise when I used a
>>>>> specific node name, everything is fine.
>>>>>
>>>>>  Thanks,
>>>>> Kishore
>>>>>
>>>>>
>>>>>
>>>>> CONFIDENTIALITY NOTICE
>>>>> NOTICE: This message is intended for the use of the individual or
>>>>> entity to which it is addressed and may contain information that is
>>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>>> If the reader of this message is not the intended recipient, you are hereby
>>>>> notified that any printing, copying, dissemination, distribution,
>>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>>> you have received this communication in error, please contact the sender
>>>>> immediately and delete it from your system. Thank You.
>>>>
>>>>
>>>>
>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
>>>
>>
>>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Re: Yarn -- one of the daemons getting killed

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

That's good info. It is more than likely that it is the OOM killer. See http://stackoverflow.com/questions/726690/who-killed-my-process-and-why for example.

Thanks,
+Vinod

On Dec 17, 2013, at 1:26 AM, Krishna Kishore Bonagiri <wr...@gmail.com> wrote:

> Hi Jeff,
> 
>   I have run the resource manager in the foreground without nohup and here are the messages when it was killed, it says it is "Killed" but doesn't say why!
> 
> 13/12/17 03:14:54 INFO capacity.CapacityScheduler: Application appattempt_1387266015651_0258_000001 released container container_1387266015651_0258_01_000003 on node: host: isredeng:36576 #containers=2 available=7936 used=256 with event: FINISHED
> 13/12/17 03:14:54 INFO rmcontainer.RMContainerImpl: container_1387266015651_0258_01_000005 Container Transitioned from ACQUIRED to RUNNING
> Killed
> 
> 
> Thanks,
> Kishore
> 
> 
> On Mon, Dec 16, 2013 at 11:10 PM, Jeff Stuckman <st...@umd.edu> wrote:
> What if you open the daemons in a "screen" session rather than running them in the background -- for example, run "yarn resourcemanager". Then you can see exactly when they terminate, and hopefully why.
> 
> From: Krishna Kishore Bonagiri
> Sent: Monday, December 16, 2013 6:20 AM
> To: user@hadoop.apache.org
> Reply To: user@hadoop.apache.org
> Subject: Re: Yarn -- one of the daemons getting killed
> 
> Hi Vinod,
> 
>  Yes, I am running on Linux.
> 
>  I was actually searching for a corresponding message in /var/log/messages to confirm that OOM killed my daemons, but could not find any corresponding messages there! According to the following link, it looks like if it is a memory issue, I should see a messages even if OOM is disabled, but I don't see it.
> 
> http://www.redhat.com/archives/taroon-list/2007-August/msg00006.html
> 
>   And, is memory consumption more in case of two node cluster than a single node one? Also, I see this problem only when I give "*" as the node name. 
> 
>   One other thing I suspected was the allowed number of user processes, I increased that to 31000 from 1024 but that also didn't help.
> 
> Thanks,
> Kishore
> 
> 
> On Fri, Dec 13, 2013 at 11:51 PM, Vinod Kumar Vavilapalli <vi...@hortonworks.com> wrote:
> Yes, that is what I suspect. That is why I asked if everything is on a single node. If you are running linux, linux OOM killer may be shooting things down. When it happens, you will see something like "'killed process" in system's syslog.
> 
> Thanks,
> +Vinod
> 
> On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri <wr...@gmail.com> wrote:
> 
>> Vinod,
>> 
>>   One more thing I observed is that, my Client which submits Application Master one after another continuously also gets killed sometimes. So, it is always any of the Java Processes that is getting killed. Does it indicate some excessive memory usage by them or something like that, that is causing them die? If so, how can we resolve this kind of issue?
>> 
>> Thanks,
>> Kishore
>> 
>> 
>> On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri <wr...@gmail.com> wrote:
>> No, I am running on 2 node cluster.
>> 
>> 
>> On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli <vi...@hortonworks.com> wrote:
>> Is all of this on a single node?
>> 
>> Thanks,
>> +Vinod
>> 
>> On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <wr...@gmail.com> wrote:
>> 
>>> Hi,
>>>   I am running a small application on YARN (2.2.0) in a loop of 500 times, and while doing so one of the daemons, node manager, resource manager, or data node is getting killed (I mean disappearing) at a random point. I see no information in the corresponding log files. How can I know why is it happening so?
>>> 
>>>  And, one more observation is that, this is happening only when I am using "*" for node name in the container requests, otherwise when I used a specific node name, everything is fine.
>>> 
>>> Thanks,
>>> Kishore
>> 
>> 
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
>> 
>> 
> 
> 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
> 
> 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Yarn -- one of the daemons getting killed

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

That's good info. It is more than likely that it is the OOM killer. See http://stackoverflow.com/questions/726690/who-killed-my-process-and-why for example.

Thanks,
+Vinod

On Dec 17, 2013, at 1:26 AM, Krishna Kishore Bonagiri <wr...@gmail.com> wrote:

> Hi Jeff,
> 
>   I have run the resource manager in the foreground without nohup and here are the messages when it was killed, it says it is "Killed" but doesn't say why!
> 
> 13/12/17 03:14:54 INFO capacity.CapacityScheduler: Application appattempt_1387266015651_0258_000001 released container container_1387266015651_0258_01_000003 on node: host: isredeng:36576 #containers=2 available=7936 used=256 with event: FINISHED
> 13/12/17 03:14:54 INFO rmcontainer.RMContainerImpl: container_1387266015651_0258_01_000005 Container Transitioned from ACQUIRED to RUNNING
> Killed
> 
> 
> Thanks,
> Kishore
> 
> 
> On Mon, Dec 16, 2013 at 11:10 PM, Jeff Stuckman <st...@umd.edu> wrote:
> What if you open the daemons in a "screen" session rather than running them in the background -- for example, run "yarn resourcemanager". Then you can see exactly when they terminate, and hopefully why.
> 
> From: Krishna Kishore Bonagiri
> Sent: Monday, December 16, 2013 6:20 AM
> To: user@hadoop.apache.org
> Reply To: user@hadoop.apache.org
> Subject: Re: Yarn -- one of the daemons getting killed
> 
> Hi Vinod,
> 
>  Yes, I am running on Linux.
> 
>  I was actually searching for a corresponding message in /var/log/messages to confirm that OOM killed my daemons, but could not find any corresponding messages there! According to the following link, it looks like if it is a memory issue, I should see a messages even if OOM is disabled, but I don't see it.
> 
> http://www.redhat.com/archives/taroon-list/2007-August/msg00006.html
> 
>   And, is memory consumption more in case of two node cluster than a single node one? Also, I see this problem only when I give "*" as the node name. 
> 
>   One other thing I suspected was the allowed number of user processes, I increased that to 31000 from 1024 but that also didn't help.
> 
> Thanks,
> Kishore
> 
> 
> On Fri, Dec 13, 2013 at 11:51 PM, Vinod Kumar Vavilapalli <vi...@hortonworks.com> wrote:
> Yes, that is what I suspect. That is why I asked if everything is on a single node. If you are running linux, linux OOM killer may be shooting things down. When it happens, you will see something like "'killed process" in system's syslog.
> 
> Thanks,
> +Vinod
> 
> On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri <wr...@gmail.com> wrote:
> 
>> Vinod,
>> 
>>   One more thing I observed is that, my Client which submits Application Master one after another continuously also gets killed sometimes. So, it is always any of the Java Processes that is getting killed. Does it indicate some excessive memory usage by them or something like that, that is causing them die? If so, how can we resolve this kind of issue?
>> 
>> Thanks,
>> Kishore
>> 
>> 
>> On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri <wr...@gmail.com> wrote:
>> No, I am running on 2 node cluster.
>> 
>> 
>> On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli <vi...@hortonworks.com> wrote:
>> Is all of this on a single node?
>> 
>> Thanks,
>> +Vinod
>> 
>> On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <wr...@gmail.com> wrote:
>> 
>>> Hi,
>>>   I am running a small application on YARN (2.2.0) in a loop of 500 times, and while doing so one of the daemons, node manager, resource manager, or data node is getting killed (I mean disappearing) at a random point. I see no information in the corresponding log files. How can I know why is it happening so?
>>> 
>>>  And, one more observation is that, this is happening only when I am using "*" for node name in the container requests, otherwise when I used a specific node name, everything is fine.
>>> 
>>> Thanks,
>>> Kishore
>> 
>> 
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
>> 
>> 
> 
> 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
> 
> 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Yarn -- one of the daemons getting killed

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

That's good info. It is more than likely that it is the OOM killer. See http://stackoverflow.com/questions/726690/who-killed-my-process-and-why for example.

Thanks,
+Vinod

On Dec 17, 2013, at 1:26 AM, Krishna Kishore Bonagiri <wr...@gmail.com> wrote:

> Hi Jeff,
> 
>   I have run the resource manager in the foreground without nohup and here are the messages when it was killed, it says it is "Killed" but doesn't say why!
> 
> 13/12/17 03:14:54 INFO capacity.CapacityScheduler: Application appattempt_1387266015651_0258_000001 released container container_1387266015651_0258_01_000003 on node: host: isredeng:36576 #containers=2 available=7936 used=256 with event: FINISHED
> 13/12/17 03:14:54 INFO rmcontainer.RMContainerImpl: container_1387266015651_0258_01_000005 Container Transitioned from ACQUIRED to RUNNING
> Killed
> 
> 
> Thanks,
> Kishore
> 
> 
> On Mon, Dec 16, 2013 at 11:10 PM, Jeff Stuckman <st...@umd.edu> wrote:
> What if you open the daemons in a "screen" session rather than running them in the background -- for example, run "yarn resourcemanager". Then you can see exactly when they terminate, and hopefully why.
> 
> From: Krishna Kishore Bonagiri
> Sent: Monday, December 16, 2013 6:20 AM
> To: user@hadoop.apache.org
> Reply To: user@hadoop.apache.org
> Subject: Re: Yarn -- one of the daemons getting killed
> 
> Hi Vinod,
> 
>  Yes, I am running on Linux.
> 
>  I was actually searching for a corresponding message in /var/log/messages to confirm that OOM killed my daemons, but could not find any corresponding messages there! According to the following link, it looks like if it is a memory issue, I should see a messages even if OOM is disabled, but I don't see it.
> 
> http://www.redhat.com/archives/taroon-list/2007-August/msg00006.html
> 
>   And, is memory consumption more in case of two node cluster than a single node one? Also, I see this problem only when I give "*" as the node name. 
> 
>   One other thing I suspected was the allowed number of user processes, I increased that to 31000 from 1024 but that also didn't help.
> 
> Thanks,
> Kishore
> 
> 
> On Fri, Dec 13, 2013 at 11:51 PM, Vinod Kumar Vavilapalli <vi...@hortonworks.com> wrote:
> Yes, that is what I suspect. That is why I asked if everything is on a single node. If you are running linux, linux OOM killer may be shooting things down. When it happens, you will see something like "'killed process" in system's syslog.
> 
> Thanks,
> +Vinod
> 
> On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri <wr...@gmail.com> wrote:
> 
>> Vinod,
>> 
>>   One more thing I observed is that, my Client which submits Application Master one after another continuously also gets killed sometimes. So, it is always any of the Java Processes that is getting killed. Does it indicate some excessive memory usage by them or something like that, that is causing them die? If so, how can we resolve this kind of issue?
>> 
>> Thanks,
>> Kishore
>> 
>> 
>> On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri <wr...@gmail.com> wrote:
>> No, I am running on 2 node cluster.
>> 
>> 
>> On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli <vi...@hortonworks.com> wrote:
>> Is all of this on a single node?
>> 
>> Thanks,
>> +Vinod
>> 
>> On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <wr...@gmail.com> wrote:
>> 
>>> Hi,
>>>   I am running a small application on YARN (2.2.0) in a loop of 500 times, and while doing so one of the daemons, node manager, resource manager, or data node is getting killed (I mean disappearing) at a random point. I see no information in the corresponding log files. How can I know why is it happening so?
>>> 
>>>  And, one more observation is that, this is happening only when I am using "*" for node name in the container requests, otherwise when I used a specific node name, everything is fine.
>>> 
>>> Thanks,
>>> Kishore
>> 
>> 
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
>> 
>> 
> 
> 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
> 
> 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Yarn -- one of the daemons getting killed

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

That's good info. It is more than likely that it is the OOM killer. See http://stackoverflow.com/questions/726690/who-killed-my-process-and-why for example.

Thanks,
+Vinod

On Dec 17, 2013, at 1:26 AM, Krishna Kishore Bonagiri <wr...@gmail.com> wrote:

> Hi Jeff,
> 
>   I have run the resource manager in the foreground without nohup and here are the messages when it was killed, it says it is "Killed" but doesn't say why!
> 
> 13/12/17 03:14:54 INFO capacity.CapacityScheduler: Application appattempt_1387266015651_0258_000001 released container container_1387266015651_0258_01_000003 on node: host: isredeng:36576 #containers=2 available=7936 used=256 with event: FINISHED
> 13/12/17 03:14:54 INFO rmcontainer.RMContainerImpl: container_1387266015651_0258_01_000005 Container Transitioned from ACQUIRED to RUNNING
> Killed
> 
> 
> Thanks,
> Kishore
> 
> 
> On Mon, Dec 16, 2013 at 11:10 PM, Jeff Stuckman <st...@umd.edu> wrote:
> What if you open the daemons in a "screen" session rather than running them in the background -- for example, run "yarn resourcemanager". Then you can see exactly when they terminate, and hopefully why.
> 
> From: Krishna Kishore Bonagiri
> Sent: Monday, December 16, 2013 6:20 AM
> To: user@hadoop.apache.org
> Reply To: user@hadoop.apache.org
> Subject: Re: Yarn -- one of the daemons getting killed
> 
> Hi Vinod,
> 
>  Yes, I am running on Linux.
> 
>  I was actually searching for a corresponding message in /var/log/messages to confirm that OOM killed my daemons, but could not find any corresponding messages there! According to the following link, it looks like if it is a memory issue, I should see a messages even if OOM is disabled, but I don't see it.
> 
> http://www.redhat.com/archives/taroon-list/2007-August/msg00006.html
> 
>   And, is memory consumption more in case of two node cluster than a single node one? Also, I see this problem only when I give "*" as the node name. 
> 
>   One other thing I suspected was the allowed number of user processes, I increased that to 31000 from 1024 but that also didn't help.
> 
> Thanks,
> Kishore
> 
> 
> On Fri, Dec 13, 2013 at 11:51 PM, Vinod Kumar Vavilapalli <vi...@hortonworks.com> wrote:
> Yes, that is what I suspect. That is why I asked if everything is on a single node. If you are running linux, linux OOM killer may be shooting things down. When it happens, you will see something like "'killed process" in system's syslog.
> 
> Thanks,
> +Vinod
> 
> On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri <wr...@gmail.com> wrote:
> 
>> Vinod,
>> 
>>   One more thing I observed is that, my Client which submits Application Master one after another continuously also gets killed sometimes. So, it is always any of the Java Processes that is getting killed. Does it indicate some excessive memory usage by them or something like that, that is causing them die? If so, how can we resolve this kind of issue?
>> 
>> Thanks,
>> Kishore
>> 
>> 
>> On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri <wr...@gmail.com> wrote:
>> No, I am running on 2 node cluster.
>> 
>> 
>> On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli <vi...@hortonworks.com> wrote:
>> Is all of this on a single node?
>> 
>> Thanks,
>> +Vinod
>> 
>> On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <wr...@gmail.com> wrote:
>> 
>>> Hi,
>>>   I am running a small application on YARN (2.2.0) in a loop of 500 times, and while doing so one of the daemons, node manager, resource manager, or data node is getting killed (I mean disappearing) at a random point. I see no information in the corresponding log files. How can I know why is it happening so?
>>> 
>>>  And, one more observation is that, this is happening only when I am using "*" for node name in the container requests, otherwise when I used a specific node name, everything is fine.
>>> 
>>> Thanks,
>>> Kishore
>> 
>> 
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
>> 
>> 
> 
> 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
> 
> 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Yarn -- one of the daemons getting killed

Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.

Hi Jeff,

  I have run the resource manager in the foreground without nohup and here
are the messages when it was killed, it says it is "Killed" but doesn't say
why!

13/12/17 03:14:54 INFO capacity.CapacityScheduler: Application
appattempt_1387266015651_0258_000001 released container
container_1387266015651_0258_01_000003 on node: host: isredeng:36576
#containers=2 available=7936 used=256 with event: FINISHED
13/12/17 03:14:54 INFO rmcontainer.RMContainerImpl:
container_1387266015651_0258_01_000005 Container Transitioned from ACQUIRED
to RUNNING
Killed


Thanks,
Kishore


On Mon, Dec 16, 2013 at 11:10 PM, Jeff Stuckman <st...@umd.edu> wrote:

>  What if you open the daemons in a "screen" session rather than running
> them in the background -- for example, run "yarn resourcemanager". Then you
> can see exactly when they terminate, and hopefully why.
>
>    *From: *Krishna Kishore Bonagiri
> *Sent: *Monday, December 16, 2013 6:20 AM
> *To: *user@hadoop.apache.org
> *Reply To: *user@hadoop.apache.org
> *Subject: *Re: Yarn -- one of the daemons getting killed
>
>  Hi Vinod,
>
>   Yes, I am running on Linux.
>
>  I was actually searching for a corresponding message in /var/log/messages
> to confirm that OOM killed my daemons, but could not find any corresponding
> messages there! According to the following link, it looks like if it is a
> memory issue, I should see a messages even if OOM is disabled, but I don't
> see it.
>
>  http://www.redhat.com/archives/taroon-list/2007-August/msg00006.html
>
>    And, is memory consumption more in case of two node cluster than a
> single node one? Also, I see this problem only when I give "*" as the node
> name.
>
>    One other thing I suspected was the allowed number of user processes,
> I increased that to 31000 from 1024 but that also didn't help.
>
>  Thanks,
> Kishore
>
>
> On Fri, Dec 13, 2013 at 11:51 PM, Vinod Kumar Vavilapalli <
> vinodkv@hortonworks.com> wrote:
>
>> Yes, that is what I suspect. That is why I asked if everything is on a
>> single node. If you are running linux, linux OOM killer may be shooting
>> things down. When it happens, you will see something like "'killed process"
>> in system's syslog.
>>
>>    Thanks,
>> +Vinod
>>
>>  On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri <
>> write2kishore@gmail.com> wrote:
>>
>>  Vinod,
>>
>>   One more thing I observed is that, my Client which submits Application
>> Master one after another continuously also gets killed sometimes. So, it is
>> always any of the Java Processes that is getting killed. Does it indicate
>> some excessive memory usage by them or something like that, that is causing
>> them die? If so, how can we resolve this kind of issue?
>>
>>  Thanks,
>> Kishore
>>
>>
>> On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri <
>> write2kishore@gmail.com> wrote:
>>
>>> No, I am running on 2 node cluster.
>>>
>>>
>>> On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli <
>>> vinodkv@hortonworks.com> wrote:
>>>
>>>> Is all of this on a single node?
>>>>
>>>>   Thanks,
>>>> +Vinod
>>>>
>>>>  On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <
>>>> write2kishore@gmail.com> wrote:
>>>>
>>>>  Hi,
>>>>   I am running a small application on YARN (2.2.0) in a loop of 500
>>>> times, and while doing so one of the daemons, node manager, resource
>>>> manager, or data node is getting killed (I mean disappearing) at a random
>>>> point. I see no information in the corresponding log files. How can I know
>>>> why is it happening so?
>>>>
>>>>   And, one more observation is that, this is happening only when I am
>>>> using "*" for node name in the container requests, otherwise when I used a
>>>> specific node name, everything is fine.
>>>>
>>>>  Thanks,
>>>> Kishore
>>>>
>>>>
>>>>
>>>> CONFIDENTIALITY NOTICE
>>>> NOTICE: This message is intended for the use of the individual or
>>>> entity to which it is addressed and may contain information that is
>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>> If the reader of this message is not the intended recipient, you are hereby
>>>> notified that any printing, copying, dissemination, distribution,
>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>> you have received this communication in error, please contact the sender
>>>> immediately and delete it from your system. Thank You.
>>>
>>>
>>>
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>>
>
>

RE: Yarn -- one of the daemons getting killed

Posted by java8964 <ja...@hotmail.com>.

If it is not killed by OOM killer, maybe the JVM just did a core dump due to whatever reason. Search for core dump of process in the /var/log/messages, or core dump file in your system.
From: stuckman@umd.edu
To: user@hadoop.apache.org; user@hadoop.apache.org
Subject: Re: Yarn -- one of the daemons getting killed
Date: Mon, 16 Dec 2013 17:40:10 +0000

What if you open the daemons in a "screen" session rather than running them in the background -- for example, run "yarn resourcemanager". Then you can see exactly when they terminate, and hopefully why.

From: Krishna Kishore Bonagiri
Sent: Monday, December 16, 2013 6:20 AM
To: user@hadoop.apache.org
Reply To: user@hadoop.apache.org
Subject: Re: Yarn -- one of the daemons getting killed

Hi Vinod,

 Yes, I am running on Linux.

 I was actually searching for a corresponding message in /var/log/messages to confirm that OOM killed my daemons, but could not find any corresponding messages there! According to the following link, it looks like if it is a memory issue, I should see
 a messages even if OOM is disabled, but I don't see it.

http://www.redhat.com/archives/taroon-list/2007-August/msg00006.html

  And, is memory consumption more in case of two node cluster than a single node one? Also, I see this problem only when I give "*" as the node name. 

  One other thing I suspected was the allowed number of user processes, I increased that to 31000 from 1024 but that also didn't help.

Thanks,
Kishore

On Fri, Dec 13, 2013 at 11:51 PM, Vinod Kumar Vavilapalli
<vi...@hortonworks.com> wrote:

Yes, that is what I suspect. That is why I asked if everything is on a single node. If you are running linux, linux OOM killer may be shooting things down. When it happens, you will see something like "'killed process" in system's
 syslog.

Thanks,
+Vinod

On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri <wr...@gmail.com> wrote:

Vinod,

  One more thing I observed is that, my Client which submits Application Master one after another continuously also gets killed sometimes. So, it is always any of the Java Processes that is getting killed. Does it indicate some excessive memory usage by
 them or something like that, that is causing them die? If so, how can we resolve this kind of issue?

Thanks,
Kishore

On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri
<wr...@gmail.com> wrote:

No, I am running on 2 node cluster.

On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli
<vi...@hortonworks.com> wrote:

Is all of this on a single node?

Thanks,
+Vinod

On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <wr...@gmail.com> wrote:

Hi,
  I am running a small application on YARN (2.2.0) in a loop of 500 times, and while doing so one of the daemons, node manager, resource manager, or data node is getting killed (I mean disappearing) at a random point. I see no information in the corresponding
 log files. How can I know why is it happening so?

 And, one more observation is that, this is happening only when I am using "*" for node name in the container requests, otherwise when I used a specific node name, everything is fine.

Thanks,
Kishore

CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from
 disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from
 disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

Re: Yarn -- one of the daemons getting killed

Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.

Hi Jeff,

  I have run the resource manager in the foreground without nohup and here
are the messages when it was killed, it says it is "Killed" but doesn't say
why!

13/12/17 03:14:54 INFO capacity.CapacityScheduler: Application
appattempt_1387266015651_0258_000001 released container
container_1387266015651_0258_01_000003 on node: host: isredeng:36576
#containers=2 available=7936 used=256 with event: FINISHED
13/12/17 03:14:54 INFO rmcontainer.RMContainerImpl:
container_1387266015651_0258_01_000005 Container Transitioned from ACQUIRED
to RUNNING
Killed


Thanks,
Kishore


On Mon, Dec 16, 2013 at 11:10 PM, Jeff Stuckman <st...@umd.edu> wrote:

>  What if you open the daemons in a "screen" session rather than running
> them in the background -- for example, run "yarn resourcemanager". Then you
> can see exactly when they terminate, and hopefully why.
>
>    *From: *Krishna Kishore Bonagiri
> *Sent: *Monday, December 16, 2013 6:20 AM
> *To: *user@hadoop.apache.org
> *Reply To: *user@hadoop.apache.org
> *Subject: *Re: Yarn -- one of the daemons getting killed
>
>  Hi Vinod,
>
>   Yes, I am running on Linux.
>
>  I was actually searching for a corresponding message in /var/log/messages
> to confirm that OOM killed my daemons, but could not find any corresponding
> messages there! According to the following link, it looks like if it is a
> memory issue, I should see a messages even if OOM is disabled, but I don't
> see it.
>
>  http://www.redhat.com/archives/taroon-list/2007-August/msg00006.html
>
>    And, is memory consumption more in case of two node cluster than a
> single node one? Also, I see this problem only when I give "*" as the node
> name.
>
>    One other thing I suspected was the allowed number of user processes,
> I increased that to 31000 from 1024 but that also didn't help.
>
>  Thanks,
> Kishore
>
>
> On Fri, Dec 13, 2013 at 11:51 PM, Vinod Kumar Vavilapalli <
> vinodkv@hortonworks.com> wrote:
>
>> Yes, that is what I suspect. That is why I asked if everything is on a
>> single node. If you are running linux, linux OOM killer may be shooting
>> things down. When it happens, you will see something like "'killed process"
>> in system's syslog.
>>
>>    Thanks,
>> +Vinod
>>
>>  On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri <
>> write2kishore@gmail.com> wrote:
>>
>>  Vinod,
>>
>>   One more thing I observed is that, my Client which submits Application
>> Master one after another continuously also gets killed sometimes. So, it is
>> always any of the Java Processes that is getting killed. Does it indicate
>> some excessive memory usage by them or something like that, that is causing
>> them die? If so, how can we resolve this kind of issue?
>>
>>  Thanks,
>> Kishore
>>
>>
>> On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri <
>> write2kishore@gmail.com> wrote:
>>
>>> No, I am running on 2 node cluster.
>>>
>>>
>>> On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli <
>>> vinodkv@hortonworks.com> wrote:
>>>
>>>> Is all of this on a single node?
>>>>
>>>>   Thanks,
>>>> +Vinod
>>>>
>>>>  On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <
>>>> write2kishore@gmail.com> wrote:
>>>>
>>>>  Hi,
>>>>   I am running a small application on YARN (2.2.0) in a loop of 500
>>>> times, and while doing so one of the daemons, node manager, resource
>>>> manager, or data node is getting killed (I mean disappearing) at a random
>>>> point. I see no information in the corresponding log files. How can I know
>>>> why is it happening so?
>>>>
>>>>   And, one more observation is that, this is happening only when I am
>>>> using "*" for node name in the container requests, otherwise when I used a
>>>> specific node name, everything is fine.
>>>>
>>>>  Thanks,
>>>> Kishore
>>>>
>>>>
>>>>
>>>> CONFIDENTIALITY NOTICE
>>>> NOTICE: This message is intended for the use of the individual or
>>>> entity to which it is addressed and may contain information that is
>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>> If the reader of this message is not the intended recipient, you are hereby
>>>> notified that any printing, copying, dissemination, distribution,
>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>> you have received this communication in error, please contact the sender
>>>> immediately and delete it from your system. Thank You.
>>>
>>>
>>>
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>>
>
>

RE: Yarn -- one of the daemons getting killed

Posted by java8964 <ja...@hotmail.com>.

If it is not killed by OOM killer, maybe the JVM just did a core dump due to whatever reason. Search for core dump of process in the /var/log/messages, or core dump file in your system.
From: stuckman@umd.edu
To: user@hadoop.apache.org; user@hadoop.apache.org
Subject: Re: Yarn -- one of the daemons getting killed
Date: Mon, 16 Dec 2013 17:40:10 +0000

What if you open the daemons in a "screen" session rather than running them in the background -- for example, run "yarn resourcemanager". Then you can see exactly when they terminate, and hopefully why.

From: Krishna Kishore Bonagiri
Sent: Monday, December 16, 2013 6:20 AM
To: user@hadoop.apache.org
Reply To: user@hadoop.apache.org
Subject: Re: Yarn -- one of the daemons getting killed

Hi Vinod,

 Yes, I am running on Linux.

 I was actually searching for a corresponding message in /var/log/messages to confirm that OOM killed my daemons, but could not find any corresponding messages there! According to the following link, it looks like if it is a memory issue, I should see
 a messages even if OOM is disabled, but I don't see it.

http://www.redhat.com/archives/taroon-list/2007-August/msg00006.html

  And, is memory consumption more in case of two node cluster than a single node one? Also, I see this problem only when I give "*" as the node name. 

  One other thing I suspected was the allowed number of user processes, I increased that to 31000 from 1024 but that also didn't help.

Thanks,
Kishore

On Fri, Dec 13, 2013 at 11:51 PM, Vinod Kumar Vavilapalli
<vi...@hortonworks.com> wrote:

Yes, that is what I suspect. That is why I asked if everything is on a single node. If you are running linux, linux OOM killer may be shooting things down. When it happens, you will see something like "'killed process" in system's
 syslog.

Thanks,
+Vinod

On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri <wr...@gmail.com> wrote:

Vinod,

  One more thing I observed is that, my Client which submits Application Master one after another continuously also gets killed sometimes. So, it is always any of the Java Processes that is getting killed. Does it indicate some excessive memory usage by
 them or something like that, that is causing them die? If so, how can we resolve this kind of issue?

Thanks,
Kishore

On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri
<wr...@gmail.com> wrote:

No, I am running on 2 node cluster.

On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli
<vi...@hortonworks.com> wrote:

Is all of this on a single node?

Thanks,
+Vinod

On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <wr...@gmail.com> wrote:

Hi,
  I am running a small application on YARN (2.2.0) in a loop of 500 times, and while doing so one of the daemons, node manager, resource manager, or data node is getting killed (I mean disappearing) at a random point. I see no information in the corresponding
 log files. How can I know why is it happening so?

 And, one more observation is that, this is happening only when I am using "*" for node name in the container requests, otherwise when I used a specific node name, everything is fine.

Thanks,
Kishore

CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from
 disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from
 disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

RE: Yarn -- one of the daemons getting killed

Posted by java8964 <ja...@hotmail.com>.

If it is not killed by OOM killer, maybe the JVM just did a core dump due to whatever reason. Search for core dump of process in the /var/log/messages, or core dump file in your system.
From: stuckman@umd.edu
To: user@hadoop.apache.org; user@hadoop.apache.org
Subject: Re: Yarn -- one of the daemons getting killed
Date: Mon, 16 Dec 2013 17:40:10 +0000

What if you open the daemons in a "screen" session rather than running them in the background -- for example, run "yarn resourcemanager". Then you can see exactly when they terminate, and hopefully why.

From: Krishna Kishore Bonagiri
Sent: Monday, December 16, 2013 6:20 AM
To: user@hadoop.apache.org
Reply To: user@hadoop.apache.org
Subject: Re: Yarn -- one of the daemons getting killed

Hi Vinod,

 Yes, I am running on Linux.

 I was actually searching for a corresponding message in /var/log/messages to confirm that OOM killed my daemons, but could not find any corresponding messages there! According to the following link, it looks like if it is a memory issue, I should see
 a messages even if OOM is disabled, but I don't see it.

http://www.redhat.com/archives/taroon-list/2007-August/msg00006.html

  And, is memory consumption more in case of two node cluster than a single node one? Also, I see this problem only when I give "*" as the node name. 

  One other thing I suspected was the allowed number of user processes, I increased that to 31000 from 1024 but that also didn't help.

Thanks,
Kishore

On Fri, Dec 13, 2013 at 11:51 PM, Vinod Kumar Vavilapalli
<vi...@hortonworks.com> wrote:

Yes, that is what I suspect. That is why I asked if everything is on a single node. If you are running linux, linux OOM killer may be shooting things down. When it happens, you will see something like "'killed process" in system's
 syslog.

Thanks,
+Vinod

On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri <wr...@gmail.com> wrote:

Vinod,

  One more thing I observed is that, my Client which submits Application Master one after another continuously also gets killed sometimes. So, it is always any of the Java Processes that is getting killed. Does it indicate some excessive memory usage by
 them or something like that, that is causing them die? If so, how can we resolve this kind of issue?

Thanks,
Kishore

On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri
<wr...@gmail.com> wrote:

No, I am running on 2 node cluster.

On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli
<vi...@hortonworks.com> wrote:

Is all of this on a single node?

Thanks,
+Vinod

On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <wr...@gmail.com> wrote:

Hi,
  I am running a small application on YARN (2.2.0) in a loop of 500 times, and while doing so one of the daemons, node manager, resource manager, or data node is getting killed (I mean disappearing) at a random point. I see no information in the corresponding
 log files. How can I know why is it happening so?

 And, one more observation is that, this is happening only when I am using "*" for node name in the container requests, otherwise when I used a specific node name, everything is fine.

Thanks,
Kishore

CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from
 disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from
 disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

RE: Yarn -- one of the daemons getting killed

Posted by java8964 <ja...@hotmail.com>.

If it is not killed by OOM killer, maybe the JVM just did a core dump due to whatever reason. Search for core dump of process in the /var/log/messages, or core dump file in your system.
From: stuckman@umd.edu
To: user@hadoop.apache.org; user@hadoop.apache.org
Subject: Re: Yarn -- one of the daemons getting killed
Date: Mon, 16 Dec 2013 17:40:10 +0000

What if you open the daemons in a "screen" session rather than running them in the background -- for example, run "yarn resourcemanager". Then you can see exactly when they terminate, and hopefully why.

From: Krishna Kishore Bonagiri
Sent: Monday, December 16, 2013 6:20 AM
To: user@hadoop.apache.org
Reply To: user@hadoop.apache.org
Subject: Re: Yarn -- one of the daemons getting killed

Hi Vinod,

 Yes, I am running on Linux.

 I was actually searching for a corresponding message in /var/log/messages to confirm that OOM killed my daemons, but could not find any corresponding messages there! According to the following link, it looks like if it is a memory issue, I should see
 a messages even if OOM is disabled, but I don't see it.

http://www.redhat.com/archives/taroon-list/2007-August/msg00006.html

  And, is memory consumption more in case of two node cluster than a single node one? Also, I see this problem only when I give "*" as the node name. 

  One other thing I suspected was the allowed number of user processes, I increased that to 31000 from 1024 but that also didn't help.

Thanks,
Kishore

On Fri, Dec 13, 2013 at 11:51 PM, Vinod Kumar Vavilapalli
<vi...@hortonworks.com> wrote:

Yes, that is what I suspect. That is why I asked if everything is on a single node. If you are running linux, linux OOM killer may be shooting things down. When it happens, you will see something like "'killed process" in system's
 syslog.

Thanks,
+Vinod

On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri <wr...@gmail.com> wrote:

Vinod,

  One more thing I observed is that, my Client which submits Application Master one after another continuously also gets killed sometimes. So, it is always any of the Java Processes that is getting killed. Does it indicate some excessive memory usage by
 them or something like that, that is causing them die? If so, how can we resolve this kind of issue?

Thanks,
Kishore

On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri
<wr...@gmail.com> wrote:

No, I am running on 2 node cluster.

On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli
<vi...@hortonworks.com> wrote:

Is all of this on a single node?

Thanks,
+Vinod

On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <wr...@gmail.com> wrote:

Hi,
  I am running a small application on YARN (2.2.0) in a loop of 500 times, and while doing so one of the daemons, node manager, resource manager, or data node is getting killed (I mean disappearing) at a random point. I see no information in the corresponding
 log files. How can I know why is it happening so?

 And, one more observation is that, this is happening only when I am using "*" for node name in the container requests, otherwise when I used a specific node name, everything is fine.

Thanks,
Kishore

CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from
 disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from
 disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

Re: Yarn -- one of the daemons getting killed

Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.

Hi Jeff,

  I have run the resource manager in the foreground without nohup and here
are the messages when it was killed, it says it is "Killed" but doesn't say
why!

13/12/17 03:14:54 INFO capacity.CapacityScheduler: Application
appattempt_1387266015651_0258_000001 released container
container_1387266015651_0258_01_000003 on node: host: isredeng:36576
#containers=2 available=7936 used=256 with event: FINISHED
13/12/17 03:14:54 INFO rmcontainer.RMContainerImpl:
container_1387266015651_0258_01_000005 Container Transitioned from ACQUIRED
to RUNNING
Killed


Thanks,
Kishore


On Mon, Dec 16, 2013 at 11:10 PM, Jeff Stuckman <st...@umd.edu> wrote:

>  What if you open the daemons in a "screen" session rather than running
> them in the background -- for example, run "yarn resourcemanager". Then you
> can see exactly when they terminate, and hopefully why.
>
>    *From: *Krishna Kishore Bonagiri
> *Sent: *Monday, December 16, 2013 6:20 AM
> *To: *user@hadoop.apache.org
> *Reply To: *user@hadoop.apache.org
> *Subject: *Re: Yarn -- one of the daemons getting killed
>
>  Hi Vinod,
>
>   Yes, I am running on Linux.
>
>  I was actually searching for a corresponding message in /var/log/messages
> to confirm that OOM killed my daemons, but could not find any corresponding
> messages there! According to the following link, it looks like if it is a
> memory issue, I should see a messages even if OOM is disabled, but I don't
> see it.
>
>  http://www.redhat.com/archives/taroon-list/2007-August/msg00006.html
>
>    And, is memory consumption more in case of two node cluster than a
> single node one? Also, I see this problem only when I give "*" as the node
> name.
>
>    One other thing I suspected was the allowed number of user processes,
> I increased that to 31000 from 1024 but that also didn't help.
>
>  Thanks,
> Kishore
>
>
> On Fri, Dec 13, 2013 at 11:51 PM, Vinod Kumar Vavilapalli <
> vinodkv@hortonworks.com> wrote:
>
>> Yes, that is what I suspect. That is why I asked if everything is on a
>> single node. If you are running linux, linux OOM killer may be shooting
>> things down. When it happens, you will see something like "'killed process"
>> in system's syslog.
>>
>>    Thanks,
>> +Vinod
>>
>>  On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri <
>> write2kishore@gmail.com> wrote:
>>
>>  Vinod,
>>
>>   One more thing I observed is that, my Client which submits Application
>> Master one after another continuously also gets killed sometimes. So, it is
>> always any of the Java Processes that is getting killed. Does it indicate
>> some excessive memory usage by them or something like that, that is causing
>> them die? If so, how can we resolve this kind of issue?
>>
>>  Thanks,
>> Kishore
>>
>>
>> On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri <
>> write2kishore@gmail.com> wrote:
>>
>>> No, I am running on 2 node cluster.
>>>
>>>
>>> On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli <
>>> vinodkv@hortonworks.com> wrote:
>>>
>>>> Is all of this on a single node?
>>>>
>>>>   Thanks,
>>>> +Vinod
>>>>
>>>>  On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <
>>>> write2kishore@gmail.com> wrote:
>>>>
>>>>  Hi,
>>>>   I am running a small application on YARN (2.2.0) in a loop of 500
>>>> times, and while doing so one of the daemons, node manager, resource
>>>> manager, or data node is getting killed (I mean disappearing) at a random
>>>> point. I see no information in the corresponding log files. How can I know
>>>> why is it happening so?
>>>>
>>>>   And, one more observation is that, this is happening only when I am
>>>> using "*" for node name in the container requests, otherwise when I used a
>>>> specific node name, everything is fine.
>>>>
>>>>  Thanks,
>>>> Kishore
>>>>
>>>>
>>>>
>>>> CONFIDENTIALITY NOTICE
>>>> NOTICE: This message is intended for the use of the individual or
>>>> entity to which it is addressed and may contain information that is
>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>> If the reader of this message is not the intended recipient, you are hereby
>>>> notified that any printing, copying, dissemination, distribution,
>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>> you have received this communication in error, please contact the sender
>>>> immediately and delete it from your system. Thank You.
>>>
>>>
>>>
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>>
>
>

Re: Yarn -- one of the daemons getting killed

Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.

Hi Jeff,

  I have run the resource manager in the foreground without nohup and here
are the messages when it was killed, it says it is "Killed" but doesn't say
why!

13/12/17 03:14:54 INFO capacity.CapacityScheduler: Application
appattempt_1387266015651_0258_000001 released container
container_1387266015651_0258_01_000003 on node: host: isredeng:36576
#containers=2 available=7936 used=256 with event: FINISHED
13/12/17 03:14:54 INFO rmcontainer.RMContainerImpl:
container_1387266015651_0258_01_000005 Container Transitioned from ACQUIRED
to RUNNING
Killed


Thanks,
Kishore


On Mon, Dec 16, 2013 at 11:10 PM, Jeff Stuckman <st...@umd.edu> wrote:

>  What if you open the daemons in a "screen" session rather than running
> them in the background -- for example, run "yarn resourcemanager". Then you
> can see exactly when they terminate, and hopefully why.
>
>    *From: *Krishna Kishore Bonagiri
> *Sent: *Monday, December 16, 2013 6:20 AM
> *To: *user@hadoop.apache.org
> *Reply To: *user@hadoop.apache.org
> *Subject: *Re: Yarn -- one of the daemons getting killed
>
>  Hi Vinod,
>
>   Yes, I am running on Linux.
>
>  I was actually searching for a corresponding message in /var/log/messages
> to confirm that OOM killed my daemons, but could not find any corresponding
> messages there! According to the following link, it looks like if it is a
> memory issue, I should see a messages even if OOM is disabled, but I don't
> see it.
>
>  http://www.redhat.com/archives/taroon-list/2007-August/msg00006.html
>
>    And, is memory consumption more in case of two node cluster than a
> single node one? Also, I see this problem only when I give "*" as the node
> name.
>
>    One other thing I suspected was the allowed number of user processes,
> I increased that to 31000 from 1024 but that also didn't help.
>
>  Thanks,
> Kishore
>
>
> On Fri, Dec 13, 2013 at 11:51 PM, Vinod Kumar Vavilapalli <
> vinodkv@hortonworks.com> wrote:
>
>> Yes, that is what I suspect. That is why I asked if everything is on a
>> single node. If you are running linux, linux OOM killer may be shooting
>> things down. When it happens, you will see something like "'killed process"
>> in system's syslog.
>>
>>    Thanks,
>> +Vinod
>>
>>  On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri <
>> write2kishore@gmail.com> wrote:
>>
>>  Vinod,
>>
>>   One more thing I observed is that, my Client which submits Application
>> Master one after another continuously also gets killed sometimes. So, it is
>> always any of the Java Processes that is getting killed. Does it indicate
>> some excessive memory usage by them or something like that, that is causing
>> them die? If so, how can we resolve this kind of issue?
>>
>>  Thanks,
>> Kishore
>>
>>
>> On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri <
>> write2kishore@gmail.com> wrote:
>>
>>> No, I am running on 2 node cluster.
>>>
>>>
>>> On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli <
>>> vinodkv@hortonworks.com> wrote:
>>>
>>>> Is all of this on a single node?
>>>>
>>>>   Thanks,
>>>> +Vinod
>>>>
>>>>  On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <
>>>> write2kishore@gmail.com> wrote:
>>>>
>>>>  Hi,
>>>>   I am running a small application on YARN (2.2.0) in a loop of 500
>>>> times, and while doing so one of the daemons, node manager, resource
>>>> manager, or data node is getting killed (I mean disappearing) at a random
>>>> point. I see no information in the corresponding log files. How can I know
>>>> why is it happening so?
>>>>
>>>>   And, one more observation is that, this is happening only when I am
>>>> using "*" for node name in the container requests, otherwise when I used a
>>>> specific node name, everything is fine.
>>>>
>>>>  Thanks,
>>>> Kishore
>>>>
>>>>
>>>>
>>>> CONFIDENTIALITY NOTICE
>>>> NOTICE: This message is intended for the use of the individual or
>>>> entity to which it is addressed and may contain information that is
>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>> If the reader of this message is not the intended recipient, you are hereby
>>>> notified that any printing, copying, dissemination, distribution,
>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>> you have received this communication in error, please contact the sender
>>>> immediately and delete it from your system. Thank You.
>>>
>>>
>>>
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>>
>
>

Re: Yarn -- one of the daemons getting killed

Posted by Jeff Stuckman <st...@umd.edu>.

What if you open the daemons in a "screen" session rather than running them in the background -- for example, run "yarn resourcemanager". Then you can see exactly when they terminate, and hopefully why.

From: Krishna Kishore Bonagiri
Sent: Monday, December 16, 2013 6:20 AM
To: user@hadoop.apache.org
Reply To: user@hadoop.apache.org
Subject: Re: Yarn -- one of the daemons getting killed

Hi Vinod,

 Yes, I am running on Linux.

 I was actually searching for a corresponding message in /var/log/messages to confirm that OOM killed my daemons, but could not find any corresponding messages there! According to the following link, it looks like if it is a memory issue, I should see a messages even if OOM is disabled, but I don't see it.

http://www.redhat.com/archives/taroon-list/2007-August/msg00006.html

  And, is memory consumption more in case of two node cluster than a single node one? Also, I see this problem only when I give "*" as the node name.

  One other thing I suspected was the allowed number of user processes, I increased that to 31000 from 1024 but that also didn't help.

Thanks,
Kishore

On Fri, Dec 13, 2013 at 11:51 PM, Vinod Kumar Vavilapalli <vi...@hortonworks.com>> wrote:
Yes, that is what I suspect. That is why I asked if everything is on a single node. If you are running linux, linux OOM killer may be shooting things down. When it happens, you will see something like "'killed process" in system's syslog.

Thanks,
+Vinod

On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri <wr...@gmail.com>> wrote:

Vinod,

  One more thing I observed is that, my Client which submits Application Master one after another continuously also gets killed sometimes. So, it is always any of the Java Processes that is getting killed. Does it indicate some excessive memory usage by them or something like that, that is causing them die? If so, how can we resolve this kind of issue?

Thanks,
Kishore

On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri <wr...@gmail.com>> wrote:
No, I am running on 2 node cluster.

On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli <vi...@hortonworks.com>> wrote:
Is all of this on a single node?

Thanks,
+Vinod

On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <wr...@gmail.com>> wrote:

Hi,
  I am running a small application on YARN (2.2.0) in a loop of 500 times, and while doing so one of the daemons, node manager, resource manager, or data node is getting killed (I mean disappearing) at a random point. I see no information in the corresponding log files. How can I know why is it happening so?

 And, one more observation is that, this is happening only when I am using "*" for node name in the container requests, otherwise when I used a specific node name, everything is fine.

Thanks,
Kishore

CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

Re: Yarn -- one of the daemons getting killed

Posted by Jeff Stuckman <st...@umd.edu>.

What if you open the daemons in a "screen" session rather than running them in the background -- for example, run "yarn resourcemanager". Then you can see exactly when they terminate, and hopefully why.

From: Krishna Kishore Bonagiri
Sent: Monday, December 16, 2013 6:20 AM
To: user@hadoop.apache.org
Reply To: user@hadoop.apache.org
Subject: Re: Yarn -- one of the daemons getting killed

Hi Vinod,

 Yes, I am running on Linux.

 I was actually searching for a corresponding message in /var/log/messages to confirm that OOM killed my daemons, but could not find any corresponding messages there! According to the following link, it looks like if it is a memory issue, I should see a messages even if OOM is disabled, but I don't see it.

http://www.redhat.com/archives/taroon-list/2007-August/msg00006.html

  And, is memory consumption more in case of two node cluster than a single node one? Also, I see this problem only when I give "*" as the node name.

  One other thing I suspected was the allowed number of user processes, I increased that to 31000 from 1024 but that also didn't help.

Thanks,
Kishore

On Fri, Dec 13, 2013 at 11:51 PM, Vinod Kumar Vavilapalli <vi...@hortonworks.com>> wrote:
Yes, that is what I suspect. That is why I asked if everything is on a single node. If you are running linux, linux OOM killer may be shooting things down. When it happens, you will see something like "'killed process" in system's syslog.

Thanks,
+Vinod

On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri <wr...@gmail.com>> wrote:

Vinod,

  One more thing I observed is that, my Client which submits Application Master one after another continuously also gets killed sometimes. So, it is always any of the Java Processes that is getting killed. Does it indicate some excessive memory usage by them or something like that, that is causing them die? If so, how can we resolve this kind of issue?

Thanks,
Kishore

On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri <wr...@gmail.com>> wrote:
No, I am running on 2 node cluster.

On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli <vi...@hortonworks.com>> wrote:
Is all of this on a single node?

Thanks,
+Vinod

On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <wr...@gmail.com>> wrote:

Hi,
  I am running a small application on YARN (2.2.0) in a loop of 500 times, and while doing so one of the daemons, node manager, resource manager, or data node is getting killed (I mean disappearing) at a random point. I see no information in the corresponding log files. How can I know why is it happening so?

 And, one more observation is that, this is happening only when I am using "*" for node name in the container requests, otherwise when I used a specific node name, everything is fine.

Thanks,
Kishore

CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

RE: Yarn -- one of the daemons getting killed

Posted by Vinayakumar B <vi...@huawei.com>.

Hi Krishna,

Please check the out files as well for daemons. You may find something.


Cheers,
Vinayakumar B

From: Krishna Kishore Bonagiri [mailto:write2kishore@gmail.com]
Sent: 16 December 2013 16:50
To: user@hadoop.apache.org
Subject: Re: Yarn -- one of the daemons getting killed

Hi Vinod,

 Yes, I am running on Linux.

 I was actually searching for a corresponding message in /var/log/messages to confirm that OOM killed my daemons, but could not find any corresponding messages there! According to the following link, it looks like if it is a memory issue, I should see a messages even if OOM is disabled, but I don't see it.

http://www.redhat.com/archives/taroon-list/2007-August/msg00006.html

  And, is memory consumption more in case of two node cluster than a single node one? Also, I see this problem only when I give "*" as the node name.

  One other thing I suspected was the allowed number of user processes, I increased that to 31000 from 1024 but that also didn't help.

Thanks,
Kishore

On Fri, Dec 13, 2013 at 11:51 PM, Vinod Kumar Vavilapalli <vi...@hortonworks.com>> wrote:
Yes, that is what I suspect. That is why I asked if everything is on a single node. If you are running linux, linux OOM killer may be shooting things down. When it happens, you will see something like "'killed process" in system's syslog.

Thanks,
+Vinod

On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri <wr...@gmail.com>> wrote:


Vinod,

  One more thing I observed is that, my Client which submits Application Master one after another continuously also gets killed sometimes. So, it is always any of the Java Processes that is getting killed. Does it indicate some excessive memory usage by them or something like that, that is causing them die? If so, how can we resolve this kind of issue?

Thanks,
Kishore

On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri <wr...@gmail.com>> wrote:
No, I am running on 2 node cluster.

On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli <vi...@hortonworks.com>> wrote:
Is all of this on a single node?

Thanks,
+Vinod

On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <wr...@gmail.com>> wrote:


Hi,
  I am running a small application on YARN (2.2.0) in a loop of 500 times, and while doing so one of the daemons, node manager, resource manager, or data node is getting killed (I mean disappearing) at a random point. I see no information in the corresponding log files. How can I know why is it happening so?

 And, one more observation is that, this is happening only when I am using "*" for node name in the container requests, otherwise when I used a specific node name, everything is fine.

Thanks,
Kishore


CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.




CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

Re: Yarn -- one of the daemons getting killed

Posted by Jeff Stuckman <st...@umd.edu>.

What if you open the daemons in a "screen" session rather than running them in the background -- for example, run "yarn resourcemanager". Then you can see exactly when they terminate, and hopefully why.

From: Krishna Kishore Bonagiri
Sent: Monday, December 16, 2013 6:20 AM
To: user@hadoop.apache.org
Reply To: user@hadoop.apache.org
Subject: Re: Yarn -- one of the daemons getting killed

Hi Vinod,

 Yes, I am running on Linux.

 I was actually searching for a corresponding message in /var/log/messages to confirm that OOM killed my daemons, but could not find any corresponding messages there! According to the following link, it looks like if it is a memory issue, I should see a messages even if OOM is disabled, but I don't see it.

http://www.redhat.com/archives/taroon-list/2007-August/msg00006.html

  And, is memory consumption more in case of two node cluster than a single node one? Also, I see this problem only when I give "*" as the node name.

  One other thing I suspected was the allowed number of user processes, I increased that to 31000 from 1024 but that also didn't help.

Thanks,
Kishore

On Fri, Dec 13, 2013 at 11:51 PM, Vinod Kumar Vavilapalli <vi...@hortonworks.com>> wrote:
Yes, that is what I suspect. That is why I asked if everything is on a single node. If you are running linux, linux OOM killer may be shooting things down. When it happens, you will see something like "'killed process" in system's syslog.

Thanks,
+Vinod

On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri <wr...@gmail.com>> wrote:

Vinod,

  One more thing I observed is that, my Client which submits Application Master one after another continuously also gets killed sometimes. So, it is always any of the Java Processes that is getting killed. Does it indicate some excessive memory usage by them or something like that, that is causing them die? If so, how can we resolve this kind of issue?

Thanks,
Kishore

On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri <wr...@gmail.com>> wrote:
No, I am running on 2 node cluster.

On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli <vi...@hortonworks.com>> wrote:
Is all of this on a single node?

Thanks,
+Vinod

On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <wr...@gmail.com>> wrote:

Hi,
  I am running a small application on YARN (2.2.0) in a loop of 500 times, and while doing so one of the daemons, node manager, resource manager, or data node is getting killed (I mean disappearing) at a random point. I see no information in the corresponding log files. How can I know why is it happening so?

 And, one more observation is that, this is happening only when I am using "*" for node name in the container requests, otherwise when I used a specific node name, everything is fine.

Thanks,
Kishore

CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

Re: Yarn -- one of the daemons getting killed

Posted by Jeff Stuckman <st...@umd.edu>.

What if you open the daemons in a "screen" session rather than running them in the background -- for example, run "yarn resourcemanager". Then you can see exactly when they terminate, and hopefully why.

From: Krishna Kishore Bonagiri
Sent: Monday, December 16, 2013 6:20 AM
To: user@hadoop.apache.org
Reply To: user@hadoop.apache.org
Subject: Re: Yarn -- one of the daemons getting killed

Hi Vinod,

 Yes, I am running on Linux.

 I was actually searching for a corresponding message in /var/log/messages to confirm that OOM killed my daemons, but could not find any corresponding messages there! According to the following link, it looks like if it is a memory issue, I should see a messages even if OOM is disabled, but I don't see it.

http://www.redhat.com/archives/taroon-list/2007-August/msg00006.html

  And, is memory consumption more in case of two node cluster than a single node one? Also, I see this problem only when I give "*" as the node name.

  One other thing I suspected was the allowed number of user processes, I increased that to 31000 from 1024 but that also didn't help.

Thanks,
Kishore

On Fri, Dec 13, 2013 at 11:51 PM, Vinod Kumar Vavilapalli <vi...@hortonworks.com>> wrote:
Yes, that is what I suspect. That is why I asked if everything is on a single node. If you are running linux, linux OOM killer may be shooting things down. When it happens, you will see something like "'killed process" in system's syslog.

Thanks,
+Vinod

On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri <wr...@gmail.com>> wrote:

Vinod,

  One more thing I observed is that, my Client which submits Application Master one after another continuously also gets killed sometimes. So, it is always any of the Java Processes that is getting killed. Does it indicate some excessive memory usage by them or something like that, that is causing them die? If so, how can we resolve this kind of issue?

Thanks,
Kishore

On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri <wr...@gmail.com>> wrote:
No, I am running on 2 node cluster.

On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli <vi...@hortonworks.com>> wrote:
Is all of this on a single node?

Thanks,
+Vinod

On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <wr...@gmail.com>> wrote:

Hi,
  I am running a small application on YARN (2.2.0) in a loop of 500 times, and while doing so one of the daemons, node manager, resource manager, or data node is getting killed (I mean disappearing) at a random point. I see no information in the corresponding log files. How can I know why is it happening so?

 And, one more observation is that, this is happening only when I am using "*" for node name in the container requests, otherwise when I used a specific node name, everything is fine.

Thanks,
Kishore

CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

Re: Yarn -- one of the daemons getting killed

Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.

Hi Vinod,

 Yes, I am running on Linux.

 I was actually searching for a corresponding message in /var/log/messages
to confirm that OOM killed my daemons, but could not find any corresponding
messages there! According to the following link, it looks like if it is a
memory issue, I should see a messages even if OOM is disabled, but I don't
see it.

http://www.redhat.com/archives/taroon-list/2007-August/msg00006.html

  And, is memory consumption more in case of two node cluster than a single
node one? Also, I see this problem only when I give "*" as the node name.

  One other thing I suspected was the allowed number of user processes, I
increased that to 31000 from 1024 but that also didn't help.

Thanks,
Kishore


On Fri, Dec 13, 2013 at 11:51 PM, Vinod Kumar Vavilapalli <
vinodkv@hortonworks.com> wrote:

> Yes, that is what I suspect. That is why I asked if everything is on a
> single node. If you are running linux, linux OOM killer may be shooting
> things down. When it happens, you will see something like "'killed process"
> in system's syslog.
>
> Thanks,
> +Vinod
>
> On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
> Vinod,
>
>   One more thing I observed is that, my Client which submits Application
> Master one after another continuously also gets killed sometimes. So, it is
> always any of the Java Processes that is getting killed. Does it indicate
> some excessive memory usage by them or something like that, that is causing
> them die? If so, how can we resolve this kind of issue?
>
> Thanks,
> Kishore
>
>
> On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>> No, I am running on 2 node cluster.
>>
>>
>> On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli <
>> vinodkv@hortonworks.com> wrote:
>>
>>> Is all of this on a single node?
>>>
>>>  Thanks,
>>> +Vinod
>>>
>>> On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <
>>> write2kishore@gmail.com> wrote:
>>>
>>> Hi,
>>>   I am running a small application on YARN (2.2.0) in a loop of 500
>>> times, and while doing so one of the daemons, node manager, resource
>>> manager, or data node is getting killed (I mean disappearing) at a random
>>> point. I see no information in the corresponding log files. How can I know
>>> why is it happening so?
>>>
>>>  And, one more observation is that, this is happening only when I am
>>> using "*" for node name in the container requests, otherwise when I used a
>>> specific node name, everything is fine.
>>>
>>> Thanks,
>>> Kishore
>>>
>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
>>
>>
>>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Re: Yarn -- one of the daemons getting killed

Posted by Adam Kawa <ka...@gmail.com>.

If you are interested, please read how we run into OOM-killer issue that
was killing our TaskTrackers
http://hakunamapdata.com/two-memory-related-issues-on-the-apache-hadoop-cluster/
(+
one issue related to heavy swapping).


2013/12/13 Vinod Kumar Vavilapalli <vi...@hortonworks.com>

> Yes, that is what I suspect. That is why I asked if everything is on a
> single node. If you are running linux, linux OOM killer may be shooting
> things down. When it happens, you will see something like "'killed process"
> in system's syslog.
>
> Thanks,
> +Vinod
>
> On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
> Vinod,
>
>   One more thing I observed is that, my Client which submits Application
> Master one after another continuously also gets killed sometimes. So, it is
> always any of the Java Processes that is getting killed. Does it indicate
> some excessive memory usage by them or something like that, that is causing
> them die? If so, how can we resolve this kind of issue?
>
> Thanks,
> Kishore
>
>
> On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>> No, I am running on 2 node cluster.
>>
>>
>> On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli <
>> vinodkv@hortonworks.com> wrote:
>>
>>> Is all of this on a single node?
>>>
>>>  Thanks,
>>> +Vinod
>>>
>>> On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <
>>> write2kishore@gmail.com> wrote:
>>>
>>> Hi,
>>>   I am running a small application on YARN (2.2.0) in a loop of 500
>>> times, and while doing so one of the daemons, node manager, resource
>>> manager, or data node is getting killed (I mean disappearing) at a random
>>> point. I see no information in the corresponding log files. How can I know
>>> why is it happening so?
>>>
>>>  And, one more observation is that, this is happening only when I am
>>> using "*" for node name in the container requests, otherwise when I used a
>>> specific node name, everything is fine.
>>>
>>> Thanks,
>>> Kishore
>>>
>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
>>
>>
>>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Re: Yarn -- one of the daemons getting killed

Posted by Adam Kawa <ka...@gmail.com>.

If you are interested, please read how we run into OOM-killer issue that
was killing our TaskTrackers
http://hakunamapdata.com/two-memory-related-issues-on-the-apache-hadoop-cluster/
(+
one issue related to heavy swapping).


2013/12/13 Vinod Kumar Vavilapalli <vi...@hortonworks.com>

> Yes, that is what I suspect. That is why I asked if everything is on a
> single node. If you are running linux, linux OOM killer may be shooting
> things down. When it happens, you will see something like "'killed process"
> in system's syslog.
>
> Thanks,
> +Vinod
>
> On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
> Vinod,
>
>   One more thing I observed is that, my Client which submits Application
> Master one after another continuously also gets killed sometimes. So, it is
> always any of the Java Processes that is getting killed. Does it indicate
> some excessive memory usage by them or something like that, that is causing
> them die? If so, how can we resolve this kind of issue?
>
> Thanks,
> Kishore
>
>
> On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>> No, I am running on 2 node cluster.
>>
>>
>> On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli <
>> vinodkv@hortonworks.com> wrote:
>>
>>> Is all of this on a single node?
>>>
>>>  Thanks,
>>> +Vinod
>>>
>>> On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <
>>> write2kishore@gmail.com> wrote:
>>>
>>> Hi,
>>>   I am running a small application on YARN (2.2.0) in a loop of 500
>>> times, and while doing so one of the daemons, node manager, resource
>>> manager, or data node is getting killed (I mean disappearing) at a random
>>> point. I see no information in the corresponding log files. How can I know
>>> why is it happening so?
>>>
>>>  And, one more observation is that, this is happening only when I am
>>> using "*" for node name in the container requests, otherwise when I used a
>>> specific node name, everything is fine.
>>>
>>> Thanks,
>>> Kishore
>>>
>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
>>
>>
>>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Re: Yarn -- one of the daemons getting killed

Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.

Hi Vinod,

 Yes, I am running on Linux.

 I was actually searching for a corresponding message in /var/log/messages
to confirm that OOM killed my daemons, but could not find any corresponding
messages there! According to the following link, it looks like if it is a
memory issue, I should see a messages even if OOM is disabled, but I don't
see it.

http://www.redhat.com/archives/taroon-list/2007-August/msg00006.html

  And, is memory consumption more in case of two node cluster than a single
node one? Also, I see this problem only when I give "*" as the node name.

  One other thing I suspected was the allowed number of user processes, I
increased that to 31000 from 1024 but that also didn't help.

Thanks,
Kishore


On Fri, Dec 13, 2013 at 11:51 PM, Vinod Kumar Vavilapalli <
vinodkv@hortonworks.com> wrote:

> Yes, that is what I suspect. That is why I asked if everything is on a
> single node. If you are running linux, linux OOM killer may be shooting
> things down. When it happens, you will see something like "'killed process"
> in system's syslog.
>
> Thanks,
> +Vinod
>
> On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
> Vinod,
>
>   One more thing I observed is that, my Client which submits Application
> Master one after another continuously also gets killed sometimes. So, it is
> always any of the Java Processes that is getting killed. Does it indicate
> some excessive memory usage by them or something like that, that is causing
> them die? If so, how can we resolve this kind of issue?
>
> Thanks,
> Kishore
>
>
> On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>> No, I am running on 2 node cluster.
>>
>>
>> On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli <
>> vinodkv@hortonworks.com> wrote:
>>
>>> Is all of this on a single node?
>>>
>>>  Thanks,
>>> +Vinod
>>>
>>> On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <
>>> write2kishore@gmail.com> wrote:
>>>
>>> Hi,
>>>   I am running a small application on YARN (2.2.0) in a loop of 500
>>> times, and while doing so one of the daemons, node manager, resource
>>> manager, or data node is getting killed (I mean disappearing) at a random
>>> point. I see no information in the corresponding log files. How can I know
>>> why is it happening so?
>>>
>>>  And, one more observation is that, this is happening only when I am
>>> using "*" for node name in the container requests, otherwise when I used a
>>> specific node name, everything is fine.
>>>
>>> Thanks,
>>> Kishore
>>>
>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
>>
>>
>>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Re: Yarn -- one of the daemons getting killed

Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.

Hi Vinod,

 Yes, I am running on Linux.

 I was actually searching for a corresponding message in /var/log/messages
to confirm that OOM killed my daemons, but could not find any corresponding
messages there! According to the following link, it looks like if it is a
memory issue, I should see a messages even if OOM is disabled, but I don't
see it.

http://www.redhat.com/archives/taroon-list/2007-August/msg00006.html

  And, is memory consumption more in case of two node cluster than a single
node one? Also, I see this problem only when I give "*" as the node name.

  One other thing I suspected was the allowed number of user processes, I
increased that to 31000 from 1024 but that also didn't help.

Thanks,
Kishore


On Fri, Dec 13, 2013 at 11:51 PM, Vinod Kumar Vavilapalli <
vinodkv@hortonworks.com> wrote:

> Yes, that is what I suspect. That is why I asked if everything is on a
> single node. If you are running linux, linux OOM killer may be shooting
> things down. When it happens, you will see something like "'killed process"
> in system's syslog.
>
> Thanks,
> +Vinod
>
> On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
> Vinod,
>
>   One more thing I observed is that, my Client which submits Application
> Master one after another continuously also gets killed sometimes. So, it is
> always any of the Java Processes that is getting killed. Does it indicate
> some excessive memory usage by them or something like that, that is causing
> them die? If so, how can we resolve this kind of issue?
>
> Thanks,
> Kishore
>
>
> On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>> No, I am running on 2 node cluster.
>>
>>
>> On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli <
>> vinodkv@hortonworks.com> wrote:
>>
>>> Is all of this on a single node?
>>>
>>>  Thanks,
>>> +Vinod
>>>
>>> On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <
>>> write2kishore@gmail.com> wrote:
>>>
>>> Hi,
>>>   I am running a small application on YARN (2.2.0) in a loop of 500
>>> times, and while doing so one of the daemons, node manager, resource
>>> manager, or data node is getting killed (I mean disappearing) at a random
>>> point. I see no information in the corresponding log files. How can I know
>>> why is it happening so?
>>>
>>>  And, one more observation is that, this is happening only when I am
>>> using "*" for node name in the container requests, otherwise when I used a
>>> specific node name, everything is fine.
>>>
>>> Thanks,
>>> Kishore
>>>
>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
>>
>>
>>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Re: Yarn -- one of the daemons getting killed

Posted by Adam Kawa <ka...@gmail.com>.

If you are interested, please read how we run into OOM-killer issue that
was killing our TaskTrackers
http://hakunamapdata.com/two-memory-related-issues-on-the-apache-hadoop-cluster/
(+
one issue related to heavy swapping).


2013/12/13 Vinod Kumar Vavilapalli <vi...@hortonworks.com>

> Yes, that is what I suspect. That is why I asked if everything is on a
> single node. If you are running linux, linux OOM killer may be shooting
> things down. When it happens, you will see something like "'killed process"
> in system's syslog.
>
> Thanks,
> +Vinod
>
> On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
> Vinod,
>
>   One more thing I observed is that, my Client which submits Application
> Master one after another continuously also gets killed sometimes. So, it is
> always any of the Java Processes that is getting killed. Does it indicate
> some excessive memory usage by them or something like that, that is causing
> them die? If so, how can we resolve this kind of issue?
>
> Thanks,
> Kishore
>
>
> On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>> No, I am running on 2 node cluster.
>>
>>
>> On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli <
>> vinodkv@hortonworks.com> wrote:
>>
>>> Is all of this on a single node?
>>>
>>>  Thanks,
>>> +Vinod
>>>
>>> On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <
>>> write2kishore@gmail.com> wrote:
>>>
>>> Hi,
>>>   I am running a small application on YARN (2.2.0) in a loop of 500
>>> times, and while doing so one of the daemons, node manager, resource
>>> manager, or data node is getting killed (I mean disappearing) at a random
>>> point. I see no information in the corresponding log files. How can I know
>>> why is it happening so?
>>>
>>>  And, one more observation is that, this is happening only when I am
>>> using "*" for node name in the container requests, otherwise when I used a
>>> specific node name, everything is fine.
>>>
>>> Thanks,
>>> Kishore
>>>
>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
>>
>>
>>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Re: Yarn -- one of the daemons getting killed

Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.

Hi Vinod,

 Yes, I am running on Linux.

 I was actually searching for a corresponding message in /var/log/messages
to confirm that OOM killed my daemons, but could not find any corresponding
messages there! According to the following link, it looks like if it is a
memory issue, I should see a messages even if OOM is disabled, but I don't
see it.

http://www.redhat.com/archives/taroon-list/2007-August/msg00006.html

  And, is memory consumption more in case of two node cluster than a single
node one? Also, I see this problem only when I give "*" as the node name.

  One other thing I suspected was the allowed number of user processes, I
increased that to 31000 from 1024 but that also didn't help.

Thanks,
Kishore


On Fri, Dec 13, 2013 at 11:51 PM, Vinod Kumar Vavilapalli <
vinodkv@hortonworks.com> wrote:

> Yes, that is what I suspect. That is why I asked if everything is on a
> single node. If you are running linux, linux OOM killer may be shooting
> things down. When it happens, you will see something like "'killed process"
> in system's syslog.
>
> Thanks,
> +Vinod
>
> On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
> Vinod,
>
>   One more thing I observed is that, my Client which submits Application
> Master one after another continuously also gets killed sometimes. So, it is
> always any of the Java Processes that is getting killed. Does it indicate
> some excessive memory usage by them or something like that, that is causing
> them die? If so, how can we resolve this kind of issue?
>
> Thanks,
> Kishore
>
>
> On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>> No, I am running on 2 node cluster.
>>
>>
>> On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli <
>> vinodkv@hortonworks.com> wrote:
>>
>>> Is all of this on a single node?
>>>
>>>  Thanks,
>>> +Vinod
>>>
>>> On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <
>>> write2kishore@gmail.com> wrote:
>>>
>>> Hi,
>>>   I am running a small application on YARN (2.2.0) in a loop of 500
>>> times, and while doing so one of the daemons, node manager, resource
>>> manager, or data node is getting killed (I mean disappearing) at a random
>>> point. I see no information in the corresponding log files. How can I know
>>> why is it happening so?
>>>
>>>  And, one more observation is that, this is happening only when I am
>>> using "*" for node name in the container requests, otherwise when I used a
>>> specific node name, everything is fine.
>>>
>>> Thanks,
>>> Kishore
>>>
>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
>>
>>
>>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Re: Yarn -- one of the daemons getting killed

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

Yes, that is what I suspect. That is why I asked if everything is on a single node. If you are running linux, linux OOM killer may be shooting things down. When it happens, you will see something like "'killed process" in system's syslog.

Thanks,
+Vinod

On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri <wr...@gmail.com> wrote:

> Vinod,
> 
>   One more thing I observed is that, my Client which submits Application Master one after another continuously also gets killed sometimes. So, it is always any of the Java Processes that is getting killed. Does it indicate some excessive memory usage by them or something like that, that is causing them die? If so, how can we resolve this kind of issue?
> 
> Thanks,
> Kishore
> 
> 
> On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri <wr...@gmail.com> wrote:
> No, I am running on 2 node cluster.
> 
> 
> On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli <vi...@hortonworks.com> wrote:
> Is all of this on a single node?
> 
> Thanks,
> +Vinod
> 
> On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <wr...@gmail.com> wrote:
> 
>> Hi,
>>   I am running a small application on YARN (2.2.0) in a loop of 500 times, and while doing so one of the daemons, node manager, resource manager, or data node is getting killed (I mean disappearing) at a random point. I see no information in the corresponding log files. How can I know why is it happening so?
>> 
>>  And, one more observation is that, this is happening only when I am using "*" for node name in the container requests, otherwise when I used a specific node name, everything is fine.
>> 
>> Thanks,
>> Kishore
> 
> 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
> 
> 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Yarn -- one of the daemons getting killed

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

Yes, that is what I suspect. That is why I asked if everything is on a single node. If you are running linux, linux OOM killer may be shooting things down. When it happens, you will see something like "'killed process" in system's syslog.

Thanks,
+Vinod

On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri <wr...@gmail.com> wrote:

> Vinod,
> 
>   One more thing I observed is that, my Client which submits Application Master one after another continuously also gets killed sometimes. So, it is always any of the Java Processes that is getting killed. Does it indicate some excessive memory usage by them or something like that, that is causing them die? If so, how can we resolve this kind of issue?
> 
> Thanks,
> Kishore
> 
> 
> On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri <wr...@gmail.com> wrote:
> No, I am running on 2 node cluster.
> 
> 
> On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli <vi...@hortonworks.com> wrote:
> Is all of this on a single node?
> 
> Thanks,
> +Vinod
> 
> On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <wr...@gmail.com> wrote:
> 
>> Hi,
>>   I am running a small application on YARN (2.2.0) in a loop of 500 times, and while doing so one of the daemons, node manager, resource manager, or data node is getting killed (I mean disappearing) at a random point. I see no information in the corresponding log files. How can I know why is it happening so?
>> 
>>  And, one more observation is that, this is happening only when I am using "*" for node name in the container requests, otherwise when I used a specific node name, everything is fine.
>> 
>> Thanks,
>> Kishore
> 
> 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
> 
> 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Yarn -- one of the daemons getting killed

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

Yes, that is what I suspect. That is why I asked if everything is on a single node. If you are running linux, linux OOM killer may be shooting things down. When it happens, you will see something like "'killed process" in system's syslog.

Thanks,
+Vinod

On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri <wr...@gmail.com> wrote:

> Vinod,
> 
>   One more thing I observed is that, my Client which submits Application Master one after another continuously also gets killed sometimes. So, it is always any of the Java Processes that is getting killed. Does it indicate some excessive memory usage by them or something like that, that is causing them die? If so, how can we resolve this kind of issue?
> 
> Thanks,
> Kishore
> 
> 
> On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri <wr...@gmail.com> wrote:
> No, I am running on 2 node cluster.
> 
> 
> On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli <vi...@hortonworks.com> wrote:
> Is all of this on a single node?
> 
> Thanks,
> +Vinod
> 
> On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <wr...@gmail.com> wrote:
> 
>> Hi,
>>   I am running a small application on YARN (2.2.0) in a loop of 500 times, and while doing so one of the daemons, node manager, resource manager, or data node is getting killed (I mean disappearing) at a random point. I see no information in the corresponding log files. How can I know why is it happening so?
>> 
>>  And, one more observation is that, this is happening only when I am using "*" for node name in the container requests, otherwise when I used a specific node name, everything is fine.
>> 
>> Thanks,
>> Kishore
> 
> 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
> 
> 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Yarn -- one of the daemons getting killed

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

Yes, that is what I suspect. That is why I asked if everything is on a single node. If you are running linux, linux OOM killer may be shooting things down. When it happens, you will see something like "'killed process" in system's syslog.

Thanks,
+Vinod

On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri <wr...@gmail.com> wrote:

> Vinod,
> 
>   One more thing I observed is that, my Client which submits Application Master one after another continuously also gets killed sometimes. So, it is always any of the Java Processes that is getting killed. Does it indicate some excessive memory usage by them or something like that, that is causing them die? If so, how can we resolve this kind of issue?
> 
> Thanks,
> Kishore
> 
> 
> On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri <wr...@gmail.com> wrote:
> No, I am running on 2 node cluster.
> 
> 
> On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli <vi...@hortonworks.com> wrote:
> Is all of this on a single node?
> 
> Thanks,
> +Vinod
> 
> On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <wr...@gmail.com> wrote:
> 
>> Hi,
>>   I am running a small application on YARN (2.2.0) in a loop of 500 times, and while doing so one of the daemons, node manager, resource manager, or data node is getting killed (I mean disappearing) at a random point. I see no information in the corresponding log files. How can I know why is it happening so?
>> 
>>  And, one more observation is that, this is happening only when I am using "*" for node name in the container requests, otherwise when I used a specific node name, everything is fine.
>> 
>> Thanks,
>> Kishore
> 
> 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
> 
> 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Yarn -- one of the daemons getting killed

Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.

Vinod,

  One more thing I observed is that, my Client which submits Application
Master one after another continuously also gets killed sometimes. So, it is
always any of the Java Processes that is getting killed. Does it indicate
some excessive memory usage by them or something like that, that is causing
them die? If so, how can we resolve this kind of issue?

Thanks,
Kishore


On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri <
write2kishore@gmail.com> wrote:

> No, I am running on 2 node cluster.
>
>
> On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli <
> vinodkv@hortonworks.com> wrote:
>
>> Is all of this on a single node?
>>
>>  Thanks,
>> +Vinod
>>
>> On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <
>> write2kishore@gmail.com> wrote:
>>
>> Hi,
>>   I am running a small application on YARN (2.2.0) in a loop of 500
>> times, and while doing so one of the daemons, node manager, resource
>> manager, or data node is getting killed (I mean disappearing) at a random
>> point. I see no information in the corresponding log files. How can I know
>> why is it happening so?
>>
>>  And, one more observation is that, this is happening only when I am
>> using "*" for node name in the container requests, otherwise when I used a
>> specific node name, everything is fine.
>>
>> Thanks,
>> Kishore
>>
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>
>
>

Re: Yarn -- one of the daemons getting killed

Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.

Vinod,

  One more thing I observed is that, my Client which submits Application
Master one after another continuously also gets killed sometimes. So, it is
always any of the Java Processes that is getting killed. Does it indicate
some excessive memory usage by them or something like that, that is causing
them die? If so, how can we resolve this kind of issue?

Thanks,
Kishore


On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri <
write2kishore@gmail.com> wrote:

> No, I am running on 2 node cluster.
>
>
> On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli <
> vinodkv@hortonworks.com> wrote:
>
>> Is all of this on a single node?
>>
>>  Thanks,
>> +Vinod
>>
>> On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <
>> write2kishore@gmail.com> wrote:
>>
>> Hi,
>>   I am running a small application on YARN (2.2.0) in a loop of 500
>> times, and while doing so one of the daemons, node manager, resource
>> manager, or data node is getting killed (I mean disappearing) at a random
>> point. I see no information in the corresponding log files. How can I know
>> why is it happening so?
>>
>>  And, one more observation is that, this is happening only when I am
>> using "*" for node name in the container requests, otherwise when I used a
>> specific node name, everything is fine.
>>
>> Thanks,
>> Kishore
>>
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>
>
>

Re: Yarn -- one of the daemons getting killed

Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.

Vinod,

  One more thing I observed is that, my Client which submits Application
Master one after another continuously also gets killed sometimes. So, it is
always any of the Java Processes that is getting killed. Does it indicate
some excessive memory usage by them or something like that, that is causing
them die? If so, how can we resolve this kind of issue?

Thanks,
Kishore


On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri <
write2kishore@gmail.com> wrote:

> No, I am running on 2 node cluster.
>
>
> On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli <
> vinodkv@hortonworks.com> wrote:
>
>> Is all of this on a single node?
>>
>>  Thanks,
>> +Vinod
>>
>> On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <
>> write2kishore@gmail.com> wrote:
>>
>> Hi,
>>   I am running a small application on YARN (2.2.0) in a loop of 500
>> times, and while doing so one of the daemons, node manager, resource
>> manager, or data node is getting killed (I mean disappearing) at a random
>> point. I see no information in the corresponding log files. How can I know
>> why is it happening so?
>>
>>  And, one more observation is that, this is happening only when I am
>> using "*" for node name in the container requests, otherwise when I used a
>> specific node name, everything is fine.
>>
>> Thanks,
>> Kishore
>>
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>
>
>

Re: Yarn -- one of the daemons getting killed

Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.

Vinod,

  One more thing I observed is that, my Client which submits Application
Master one after another continuously also gets killed sometimes. So, it is
always any of the Java Processes that is getting killed. Does it indicate
some excessive memory usage by them or something like that, that is causing
them die? If so, how can we resolve this kind of issue?

Thanks,
Kishore


On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri <
write2kishore@gmail.com> wrote:

> No, I am running on 2 node cluster.
>
>
> On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli <
> vinodkv@hortonworks.com> wrote:
>
>> Is all of this on a single node?
>>
>>  Thanks,
>> +Vinod
>>
>> On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <
>> write2kishore@gmail.com> wrote:
>>
>> Hi,
>>   I am running a small application on YARN (2.2.0) in a loop of 500
>> times, and while doing so one of the daemons, node manager, resource
>> manager, or data node is getting killed (I mean disappearing) at a random
>> point. I see no information in the corresponding log files. How can I know
>> why is it happening so?
>>
>>  And, one more observation is that, this is happening only when I am
>> using "*" for node name in the container requests, otherwise when I used a
>> specific node name, everything is fine.
>>
>> Thanks,
>> Kishore
>>
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>
>
>

Re: Yarn -- one of the daemons getting killed

Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.

No, I am running on 2 node cluster.


On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli <
vinodkv@hortonworks.com> wrote:

> Is all of this on a single node?
>
> Thanks,
> +Vinod
>
> On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
> Hi,
>   I am running a small application on YARN (2.2.0) in a loop of 500 times,
> and while doing so one of the daemons, node manager, resource manager, or
> data node is getting killed (I mean disappearing) at a random point. I see
> no information in the corresponding log files. How can I know why is it
> happening so?
>
>  And, one more observation is that, this is happening only when I am using
> "*" for node name in the container requests, otherwise when I used a
> specific node name, everything is fine.
>
> Thanks,
> Kishore
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

Re: Yarn -- one of the daemons getting killed

Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.

No, I am running on 2 node cluster.


On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli <
vinodkv@hortonworks.com> wrote:

> Is all of this on a single node?
>
> Thanks,
> +Vinod
>
> On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
> Hi,
>   I am running a small application on YARN (2.2.0) in a loop of 500 times,
> and while doing so one of the daemons, node manager, resource manager, or
> data node is getting killed (I mean disappearing) at a random point. I see
> no information in the corresponding log files. How can I know why is it
> happening so?
>
>  And, one more observation is that, this is happening only when I am using
> "*" for node name in the container requests, otherwise when I used a
> specific node name, everything is fine.
>
> Thanks,
> Kishore
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

Re: Yarn -- one of the daemons getting killed

Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.

No, I am running on 2 node cluster.


On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli <
vinodkv@hortonworks.com> wrote:

> Is all of this on a single node?
>
> Thanks,
> +Vinod
>
> On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
> Hi,
>   I am running a small application on YARN (2.2.0) in a loop of 500 times,
> and while doing so one of the daemons, node manager, resource manager, or
> data node is getting killed (I mean disappearing) at a random point. I see
> no information in the corresponding log files. How can I know why is it
> happening so?
>
>  And, one more observation is that, this is happening only when I am using
> "*" for node name in the container requests, otherwise when I used a
> specific node name, everything is fine.
>
> Thanks,
> Kishore
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

Re: Yarn -- one of the daemons getting killed

Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.

No, I am running on 2 node cluster.


On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli <
vinodkv@hortonworks.com> wrote:

> Is all of this on a single node?
>
> Thanks,
> +Vinod
>
> On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
> Hi,
>   I am running a small application on YARN (2.2.0) in a loop of 500 times,
> and while doing so one of the daemons, node manager, resource manager, or
> data node is getting killed (I mean disappearing) at a random point. I see
> no information in the corresponding log files. How can I know why is it
> happening so?
>
>  And, one more observation is that, this is happening only when I am using
> "*" for node name in the container requests, otherwise when I used a
> specific node name, everything is fine.
>
> Thanks,
> Kishore
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

Re: Yarn -- one of the daemons getting killed

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

Is all of this on a single node?

Thanks,
+Vinod

On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <wr...@gmail.com> wrote:

> Hi,
>   I am running a small application on YARN (2.2.0) in a loop of 500 times, and while doing so one of the daemons, node manager, resource manager, or data node is getting killed (I mean disappearing) at a random point. I see no information in the corresponding log files. How can I know why is it happening so?
> 
>  And, one more observation is that, this is happening only when I am using "*" for node name in the container requests, otherwise when I used a specific node name, everything is fine.
> 
> Thanks,
> Kishore


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Yarn -- one of the daemons getting killed

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

Is all of this on a single node?

Thanks,
+Vinod

On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <wr...@gmail.com> wrote:

> Hi,
>   I am running a small application on YARN (2.2.0) in a loop of 500 times, and while doing so one of the daemons, node manager, resource manager, or data node is getting killed (I mean disappearing) at a random point. I see no information in the corresponding log files. How can I know why is it happening so?
> 
>  And, one more observation is that, this is happening only when I am using "*" for node name in the container requests, otherwise when I used a specific node name, everything is fine.
> 
> Thanks,
> Kishore


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Yarn -- one of the daemons getting killed

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

Is all of this on a single node?

Thanks,
+Vinod

On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <wr...@gmail.com> wrote:

> Hi,
>   I am running a small application on YARN (2.2.0) in a loop of 500 times, and while doing so one of the daemons, node manager, resource manager, or data node is getting killed (I mean disappearing) at a random point. I see no information in the corresponding log files. How can I know why is it happening so?
> 
>  And, one more observation is that, this is happening only when I am using "*" for node name in the container requests, otherwise when I used a specific node name, everything is fine.
> 
> Thanks,
> Kishore


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Yarn -- one of the daemons getting killed

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

Is all of this on a single node?

Thanks,
+Vinod

On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <wr...@gmail.com> wrote:

> Hi,
>   I am running a small application on YARN (2.2.0) in a loop of 500 times, and while doing so one of the daemons, node manager, resource manager, or data node is getting killed (I mean disappearing) at a random point. I see no information in the corresponding log files. How can I know why is it happening so?
> 
>  And, one more observation is that, this is happening only when I am using "*" for node name in the container requests, otherwise when I used a specific node name, everything is fine.
> 
> Thanks,
> Kishore


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.