You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Krishna Kishore Bonagiri <wr...@gmail.com> on 2013/03/20 12:24:03 UTC

Too many open files error with YARN

Hi,

 I am running a date command with YARN's distributed shell example in a
loop of 1000 times in this way:

yarn jar
/home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
org.apache.hadoop.yarn.applications.distributedshell.Client --jar
/home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
--shell_command date --num_containers 2


Around 730th time or so, I am getting an error in node manager's log saying
that it failed to launch container because there are "Too many open files"
and when I observe through lsof command,I find that there is one instance
of this kind of file is left for each run of Application Master, and it
kept growing as I am running it in loop.

node1:44871->node1:50010

Is this a known issue? Or am I missing doing something? Please help.

Note: I am working on hadoop--2.0.0-alpha

Thanks,
Kishore

Re: Too many open files error with YARN

Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.
Thanks Manoj for  your suggestion, I have just compared one of the files in
the patch and that is not there in my version of the 2.0.0-alpha code. So,
I don't have that fix.

Thanks,
Ksihore


On Thu, Mar 21, 2013 at 1:55 PM, Manoj Babu <ma...@gmail.com> wrote:

> In the mean time you can quickly compare the source of the class
> with provided patch in the bug.
>
> Cheers!
> Manoj.
>
>
> On Thu, Mar 21, 2013 at 12:13 PM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>> Hi Hemanth & Sandy,
>>
>>   Thanks for your reply. Yes, that indicates it is in close wait state,
>> exactly like below:
>>
>> java      30718     dsadm  200u     IPv4         1178376459      0t0
>>    TCP *:50010 (LISTEN)
>> java      31512     dsadm  240u     IPv6         1178391921      0t0
>>    TCP node1:51342->node1:50010 (CLOSE_WAIT)
>>
>> I just checked in at the link
>> https://issues.apache.org/jira/browse/HDFS-3357 it shows 2.0.0-alpha
>> both in affect versions and fix versions.
>>
>> There is another bug 3591, at
>> https://issues.apache.org/jira/browse/HDFS-3591
>>
>> which says it is for backporting 3357 to branch 0.23
>>
>> So, I don't understand whether the fix is really in 2.0.0-alpha, request
>> you to please clarify me.
>>
>> Thanks,
>> Kishore
>>
>>
>>
>>
>>
>> On Thu, Mar 21, 2013 at 9:57 AM, Hemanth Yamijala <
>> yhemanth@thoughtworks.com> wrote:
>>
>>> There was an issue related to hung connections (HDFS-3357). But the JIRA
>>> indicates the fix is available in Hadoop-2.0.0-alpha. Still, would be worth
>>> checking on Sandy's suggestion
>>>
>>>
>>> On Wed, Mar 20, 2013 at 11:09 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>
>>>> Hi Kishore,
>>>>
>>>> 50010 is the datanode port. Does your lsof indicate that the sockets
>>>> are in CLOSE_WAIT?  I had come across an issue like this where that was a
>>>> symptom.
>>>>
>>>> -Sandy
>>>>
>>>>
>>>> On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
>>>> write2kishore@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>>  I am running a date command with YARN's distributed shell example in
>>>>> a loop of 1000 times in this way:
>>>>>
>>>>> yarn jar
>>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>>> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
>>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>>> --shell_command date --num_containers 2
>>>>>
>>>>>
>>>>> Around 730th time or so, I am getting an error in node manager's log
>>>>> saying that it failed to launch container because there are "Too many open
>>>>> files" and when I observe through lsof command,I find that there is one
>>>>> instance of this kind of file is left for each run of Application Master,
>>>>> and it kept growing as I am running it in loop.
>>>>>
>>>>> node1:44871->node1:50010
>>>>>
>>>>> Is this a known issue? Or am I missing doing something? Please help.
>>>>>
>>>>> Note: I am working on hadoop--2.0.0-alpha
>>>>>
>>>>> Thanks,
>>>>> Kishore
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Too many open files error with YARN

Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.
Thanks Manoj for  your suggestion, I have just compared one of the files in
the patch and that is not there in my version of the 2.0.0-alpha code. So,
I don't have that fix.

Thanks,
Ksihore


On Thu, Mar 21, 2013 at 1:55 PM, Manoj Babu <ma...@gmail.com> wrote:

> In the mean time you can quickly compare the source of the class
> with provided patch in the bug.
>
> Cheers!
> Manoj.
>
>
> On Thu, Mar 21, 2013 at 12:13 PM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>> Hi Hemanth & Sandy,
>>
>>   Thanks for your reply. Yes, that indicates it is in close wait state,
>> exactly like below:
>>
>> java      30718     dsadm  200u     IPv4         1178376459      0t0
>>    TCP *:50010 (LISTEN)
>> java      31512     dsadm  240u     IPv6         1178391921      0t0
>>    TCP node1:51342->node1:50010 (CLOSE_WAIT)
>>
>> I just checked in at the link
>> https://issues.apache.org/jira/browse/HDFS-3357 it shows 2.0.0-alpha
>> both in affect versions and fix versions.
>>
>> There is another bug 3591, at
>> https://issues.apache.org/jira/browse/HDFS-3591
>>
>> which says it is for backporting 3357 to branch 0.23
>>
>> So, I don't understand whether the fix is really in 2.0.0-alpha, request
>> you to please clarify me.
>>
>> Thanks,
>> Kishore
>>
>>
>>
>>
>>
>> On Thu, Mar 21, 2013 at 9:57 AM, Hemanth Yamijala <
>> yhemanth@thoughtworks.com> wrote:
>>
>>> There was an issue related to hung connections (HDFS-3357). But the JIRA
>>> indicates the fix is available in Hadoop-2.0.0-alpha. Still, would be worth
>>> checking on Sandy's suggestion
>>>
>>>
>>> On Wed, Mar 20, 2013 at 11:09 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>
>>>> Hi Kishore,
>>>>
>>>> 50010 is the datanode port. Does your lsof indicate that the sockets
>>>> are in CLOSE_WAIT?  I had come across an issue like this where that was a
>>>> symptom.
>>>>
>>>> -Sandy
>>>>
>>>>
>>>> On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
>>>> write2kishore@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>>  I am running a date command with YARN's distributed shell example in
>>>>> a loop of 1000 times in this way:
>>>>>
>>>>> yarn jar
>>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>>> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
>>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>>> --shell_command date --num_containers 2
>>>>>
>>>>>
>>>>> Around 730th time or so, I am getting an error in node manager's log
>>>>> saying that it failed to launch container because there are "Too many open
>>>>> files" and when I observe through lsof command,I find that there is one
>>>>> instance of this kind of file is left for each run of Application Master,
>>>>> and it kept growing as I am running it in loop.
>>>>>
>>>>> node1:44871->node1:50010
>>>>>
>>>>> Is this a known issue? Or am I missing doing something? Please help.
>>>>>
>>>>> Note: I am working on hadoop--2.0.0-alpha
>>>>>
>>>>> Thanks,
>>>>> Kishore
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Too many open files error with YARN

Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.
Thanks Manoj for  your suggestion, I have just compared one of the files in
the patch and that is not there in my version of the 2.0.0-alpha code. So,
I don't have that fix.

Thanks,
Ksihore


On Thu, Mar 21, 2013 at 1:55 PM, Manoj Babu <ma...@gmail.com> wrote:

> In the mean time you can quickly compare the source of the class
> with provided patch in the bug.
>
> Cheers!
> Manoj.
>
>
> On Thu, Mar 21, 2013 at 12:13 PM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>> Hi Hemanth & Sandy,
>>
>>   Thanks for your reply. Yes, that indicates it is in close wait state,
>> exactly like below:
>>
>> java      30718     dsadm  200u     IPv4         1178376459      0t0
>>    TCP *:50010 (LISTEN)
>> java      31512     dsadm  240u     IPv6         1178391921      0t0
>>    TCP node1:51342->node1:50010 (CLOSE_WAIT)
>>
>> I just checked in at the link
>> https://issues.apache.org/jira/browse/HDFS-3357 it shows 2.0.0-alpha
>> both in affect versions and fix versions.
>>
>> There is another bug 3591, at
>> https://issues.apache.org/jira/browse/HDFS-3591
>>
>> which says it is for backporting 3357 to branch 0.23
>>
>> So, I don't understand whether the fix is really in 2.0.0-alpha, request
>> you to please clarify me.
>>
>> Thanks,
>> Kishore
>>
>>
>>
>>
>>
>> On Thu, Mar 21, 2013 at 9:57 AM, Hemanth Yamijala <
>> yhemanth@thoughtworks.com> wrote:
>>
>>> There was an issue related to hung connections (HDFS-3357). But the JIRA
>>> indicates the fix is available in Hadoop-2.0.0-alpha. Still, would be worth
>>> checking on Sandy's suggestion
>>>
>>>
>>> On Wed, Mar 20, 2013 at 11:09 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>
>>>> Hi Kishore,
>>>>
>>>> 50010 is the datanode port. Does your lsof indicate that the sockets
>>>> are in CLOSE_WAIT?  I had come across an issue like this where that was a
>>>> symptom.
>>>>
>>>> -Sandy
>>>>
>>>>
>>>> On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
>>>> write2kishore@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>>  I am running a date command with YARN's distributed shell example in
>>>>> a loop of 1000 times in this way:
>>>>>
>>>>> yarn jar
>>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>>> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
>>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>>> --shell_command date --num_containers 2
>>>>>
>>>>>
>>>>> Around 730th time or so, I am getting an error in node manager's log
>>>>> saying that it failed to launch container because there are "Too many open
>>>>> files" and when I observe through lsof command,I find that there is one
>>>>> instance of this kind of file is left for each run of Application Master,
>>>>> and it kept growing as I am running it in loop.
>>>>>
>>>>> node1:44871->node1:50010
>>>>>
>>>>> Is this a known issue? Or am I missing doing something? Please help.
>>>>>
>>>>> Note: I am working on hadoop--2.0.0-alpha
>>>>>
>>>>> Thanks,
>>>>> Kishore
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Too many open files error with YARN

Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.
Thanks Manoj for  your suggestion, I have just compared one of the files in
the patch and that is not there in my version of the 2.0.0-alpha code. So,
I don't have that fix.

Thanks,
Ksihore


On Thu, Mar 21, 2013 at 1:55 PM, Manoj Babu <ma...@gmail.com> wrote:

> In the mean time you can quickly compare the source of the class
> with provided patch in the bug.
>
> Cheers!
> Manoj.
>
>
> On Thu, Mar 21, 2013 at 12:13 PM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>> Hi Hemanth & Sandy,
>>
>>   Thanks for your reply. Yes, that indicates it is in close wait state,
>> exactly like below:
>>
>> java      30718     dsadm  200u     IPv4         1178376459      0t0
>>    TCP *:50010 (LISTEN)
>> java      31512     dsadm  240u     IPv6         1178391921      0t0
>>    TCP node1:51342->node1:50010 (CLOSE_WAIT)
>>
>> I just checked in at the link
>> https://issues.apache.org/jira/browse/HDFS-3357 it shows 2.0.0-alpha
>> both in affect versions and fix versions.
>>
>> There is another bug 3591, at
>> https://issues.apache.org/jira/browse/HDFS-3591
>>
>> which says it is for backporting 3357 to branch 0.23
>>
>> So, I don't understand whether the fix is really in 2.0.0-alpha, request
>> you to please clarify me.
>>
>> Thanks,
>> Kishore
>>
>>
>>
>>
>>
>> On Thu, Mar 21, 2013 at 9:57 AM, Hemanth Yamijala <
>> yhemanth@thoughtworks.com> wrote:
>>
>>> There was an issue related to hung connections (HDFS-3357). But the JIRA
>>> indicates the fix is available in Hadoop-2.0.0-alpha. Still, would be worth
>>> checking on Sandy's suggestion
>>>
>>>
>>> On Wed, Mar 20, 2013 at 11:09 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>
>>>> Hi Kishore,
>>>>
>>>> 50010 is the datanode port. Does your lsof indicate that the sockets
>>>> are in CLOSE_WAIT?  I had come across an issue like this where that was a
>>>> symptom.
>>>>
>>>> -Sandy
>>>>
>>>>
>>>> On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
>>>> write2kishore@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>>  I am running a date command with YARN's distributed shell example in
>>>>> a loop of 1000 times in this way:
>>>>>
>>>>> yarn jar
>>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>>> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
>>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>>> --shell_command date --num_containers 2
>>>>>
>>>>>
>>>>> Around 730th time or so, I am getting an error in node manager's log
>>>>> saying that it failed to launch container because there are "Too many open
>>>>> files" and when I observe through lsof command,I find that there is one
>>>>> instance of this kind of file is left for each run of Application Master,
>>>>> and it kept growing as I am running it in loop.
>>>>>
>>>>> node1:44871->node1:50010
>>>>>
>>>>> Is this a known issue? Or am I missing doing something? Please help.
>>>>>
>>>>> Note: I am working on hadoop--2.0.0-alpha
>>>>>
>>>>> Thanks,
>>>>> Kishore
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Too many open files error with YARN

Posted by Manoj Babu <ma...@gmail.com>.
In the mean time you can quickly compare the source of the class
with provided patch in the bug.

Cheers!
Manoj.


On Thu, Mar 21, 2013 at 12:13 PM, Krishna Kishore Bonagiri <
write2kishore@gmail.com> wrote:

> Hi Hemanth & Sandy,
>
>   Thanks for your reply. Yes, that indicates it is in close wait state,
> exactly like below:
>
> java      30718     dsadm  200u     IPv4         1178376459      0t0
>  TCP *:50010 (LISTEN)
> java      31512     dsadm  240u     IPv6         1178391921      0t0
>  TCP node1:51342->node1:50010 (CLOSE_WAIT)
>
> I just checked in at the link
> https://issues.apache.org/jira/browse/HDFS-3357 it shows 2.0.0-alpha both
> in affect versions and fix versions.
>
> There is another bug 3591, at
> https://issues.apache.org/jira/browse/HDFS-3591
>
> which says it is for backporting 3357 to branch 0.23
>
> So, I don't understand whether the fix is really in 2.0.0-alpha, request
> you to please clarify me.
>
> Thanks,
> Kishore
>
>
>
>
>
> On Thu, Mar 21, 2013 at 9:57 AM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
>
>> There was an issue related to hung connections (HDFS-3357). But the JIRA
>> indicates the fix is available in Hadoop-2.0.0-alpha. Still, would be worth
>> checking on Sandy's suggestion
>>
>>
>> On Wed, Mar 20, 2013 at 11:09 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>
>>> Hi Kishore,
>>>
>>> 50010 is the datanode port. Does your lsof indicate that the sockets are
>>> in CLOSE_WAIT?  I had come across an issue like this where that was a
>>> symptom.
>>>
>>> -Sandy
>>>
>>>
>>> On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
>>> write2kishore@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>>  I am running a date command with YARN's distributed shell example in a
>>>> loop of 1000 times in this way:
>>>>
>>>> yarn jar
>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>> --shell_command date --num_containers 2
>>>>
>>>>
>>>> Around 730th time or so, I am getting an error in node manager's log
>>>> saying that it failed to launch container because there are "Too many open
>>>> files" and when I observe through lsof command,I find that there is one
>>>> instance of this kind of file is left for each run of Application Master,
>>>> and it kept growing as I am running it in loop.
>>>>
>>>> node1:44871->node1:50010
>>>>
>>>> Is this a known issue? Or am I missing doing something? Please help.
>>>>
>>>> Note: I am working on hadoop--2.0.0-alpha
>>>>
>>>> Thanks,
>>>> Kishore
>>>>
>>>
>>>
>>
>

Re: Too many open files error with YARN

Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.
Hi Hemanth,
  Thanks for the reply, I shall try to get that jstack and reply back, I am
also trying to download hadoop-2.0.3-alpha and see if I can overcome this
error.

Thanks,
Kishore




On Thu, Mar 21, 2013 at 3:24 PM, Hemanth Yamijala <yhemanth@thoughtworks.com
> wrote:

> There is a way to confirm if it is the same bug. Can you pick a jstack on
> the process that has established a connection to 50010 and post it here..
>
> Thanks
> hemanth
>
>
> On Thu, Mar 21, 2013 at 12:13 PM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>> Hi Hemanth & Sandy,
>>
>>   Thanks for your reply. Yes, that indicates it is in close wait state,
>> exactly like below:
>>
>> java      30718     dsadm  200u     IPv4         1178376459      0t0
>>    TCP *:50010 (LISTEN)
>> java      31512     dsadm  240u     IPv6         1178391921      0t0
>>    TCP node1:51342->node1:50010 (CLOSE_WAIT)
>>
>> I just checked in at the link
>> https://issues.apache.org/jira/browse/HDFS-3357 it shows 2.0.0-alpha
>> both in affect versions and fix versions.
>>
>> There is another bug 3591, at
>> https://issues.apache.org/jira/browse/HDFS-3591
>>
>> which says it is for backporting 3357 to branch 0.23
>>
>> So, I don't understand whether the fix is really in 2.0.0-alpha, request
>> you to please clarify me.
>>
>> Thanks,
>> Kishore
>>
>>
>>
>>
>>
>> On Thu, Mar 21, 2013 at 9:57 AM, Hemanth Yamijala <
>> yhemanth@thoughtworks.com> wrote:
>>
>>> There was an issue related to hung connections (HDFS-3357). But the JIRA
>>> indicates the fix is available in Hadoop-2.0.0-alpha. Still, would be worth
>>> checking on Sandy's suggestion
>>>
>>>
>>> On Wed, Mar 20, 2013 at 11:09 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>
>>>> Hi Kishore,
>>>>
>>>> 50010 is the datanode port. Does your lsof indicate that the sockets
>>>> are in CLOSE_WAIT?  I had come across an issue like this where that was a
>>>> symptom.
>>>>
>>>> -Sandy
>>>>
>>>>
>>>> On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
>>>> write2kishore@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>>  I am running a date command with YARN's distributed shell example in
>>>>> a loop of 1000 times in this way:
>>>>>
>>>>> yarn jar
>>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>>> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
>>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>>> --shell_command date --num_containers 2
>>>>>
>>>>>
>>>>> Around 730th time or so, I am getting an error in node manager's log
>>>>> saying that it failed to launch container because there are "Too many open
>>>>> files" and when I observe through lsof command,I find that there is one
>>>>> instance of this kind of file is left for each run of Application Master,
>>>>> and it kept growing as I am running it in loop.
>>>>>
>>>>> node1:44871->node1:50010
>>>>>
>>>>> Is this a known issue? Or am I missing doing something? Please help.
>>>>>
>>>>> Note: I am working on hadoop--2.0.0-alpha
>>>>>
>>>>> Thanks,
>>>>> Kishore
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Too many open files error with YARN

Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.
Hi Hemanth,
  Thanks for the reply, I shall try to get that jstack and reply back, I am
also trying to download hadoop-2.0.3-alpha and see if I can overcome this
error.

Thanks,
Kishore




On Thu, Mar 21, 2013 at 3:24 PM, Hemanth Yamijala <yhemanth@thoughtworks.com
> wrote:

> There is a way to confirm if it is the same bug. Can you pick a jstack on
> the process that has established a connection to 50010 and post it here..
>
> Thanks
> hemanth
>
>
> On Thu, Mar 21, 2013 at 12:13 PM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>> Hi Hemanth & Sandy,
>>
>>   Thanks for your reply. Yes, that indicates it is in close wait state,
>> exactly like below:
>>
>> java      30718     dsadm  200u     IPv4         1178376459      0t0
>>    TCP *:50010 (LISTEN)
>> java      31512     dsadm  240u     IPv6         1178391921      0t0
>>    TCP node1:51342->node1:50010 (CLOSE_WAIT)
>>
>> I just checked in at the link
>> https://issues.apache.org/jira/browse/HDFS-3357 it shows 2.0.0-alpha
>> both in affect versions and fix versions.
>>
>> There is another bug 3591, at
>> https://issues.apache.org/jira/browse/HDFS-3591
>>
>> which says it is for backporting 3357 to branch 0.23
>>
>> So, I don't understand whether the fix is really in 2.0.0-alpha, request
>> you to please clarify me.
>>
>> Thanks,
>> Kishore
>>
>>
>>
>>
>>
>> On Thu, Mar 21, 2013 at 9:57 AM, Hemanth Yamijala <
>> yhemanth@thoughtworks.com> wrote:
>>
>>> There was an issue related to hung connections (HDFS-3357). But the JIRA
>>> indicates the fix is available in Hadoop-2.0.0-alpha. Still, would be worth
>>> checking on Sandy's suggestion
>>>
>>>
>>> On Wed, Mar 20, 2013 at 11:09 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>
>>>> Hi Kishore,
>>>>
>>>> 50010 is the datanode port. Does your lsof indicate that the sockets
>>>> are in CLOSE_WAIT?  I had come across an issue like this where that was a
>>>> symptom.
>>>>
>>>> -Sandy
>>>>
>>>>
>>>> On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
>>>> write2kishore@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>>  I am running a date command with YARN's distributed shell example in
>>>>> a loop of 1000 times in this way:
>>>>>
>>>>> yarn jar
>>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>>> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
>>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>>> --shell_command date --num_containers 2
>>>>>
>>>>>
>>>>> Around 730th time or so, I am getting an error in node manager's log
>>>>> saying that it failed to launch container because there are "Too many open
>>>>> files" and when I observe through lsof command,I find that there is one
>>>>> instance of this kind of file is left for each run of Application Master,
>>>>> and it kept growing as I am running it in loop.
>>>>>
>>>>> node1:44871->node1:50010
>>>>>
>>>>> Is this a known issue? Or am I missing doing something? Please help.
>>>>>
>>>>> Note: I am working on hadoop--2.0.0-alpha
>>>>>
>>>>> Thanks,
>>>>> Kishore
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Too many open files error with YARN

Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.
Hi Hemanth,
  Thanks for the reply, I shall try to get that jstack and reply back, I am
also trying to download hadoop-2.0.3-alpha and see if I can overcome this
error.

Thanks,
Kishore




On Thu, Mar 21, 2013 at 3:24 PM, Hemanth Yamijala <yhemanth@thoughtworks.com
> wrote:

> There is a way to confirm if it is the same bug. Can you pick a jstack on
> the process that has established a connection to 50010 and post it here..
>
> Thanks
> hemanth
>
>
> On Thu, Mar 21, 2013 at 12:13 PM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>> Hi Hemanth & Sandy,
>>
>>   Thanks for your reply. Yes, that indicates it is in close wait state,
>> exactly like below:
>>
>> java      30718     dsadm  200u     IPv4         1178376459      0t0
>>    TCP *:50010 (LISTEN)
>> java      31512     dsadm  240u     IPv6         1178391921      0t0
>>    TCP node1:51342->node1:50010 (CLOSE_WAIT)
>>
>> I just checked in at the link
>> https://issues.apache.org/jira/browse/HDFS-3357 it shows 2.0.0-alpha
>> both in affect versions and fix versions.
>>
>> There is another bug 3591, at
>> https://issues.apache.org/jira/browse/HDFS-3591
>>
>> which says it is for backporting 3357 to branch 0.23
>>
>> So, I don't understand whether the fix is really in 2.0.0-alpha, request
>> you to please clarify me.
>>
>> Thanks,
>> Kishore
>>
>>
>>
>>
>>
>> On Thu, Mar 21, 2013 at 9:57 AM, Hemanth Yamijala <
>> yhemanth@thoughtworks.com> wrote:
>>
>>> There was an issue related to hung connections (HDFS-3357). But the JIRA
>>> indicates the fix is available in Hadoop-2.0.0-alpha. Still, would be worth
>>> checking on Sandy's suggestion
>>>
>>>
>>> On Wed, Mar 20, 2013 at 11:09 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>
>>>> Hi Kishore,
>>>>
>>>> 50010 is the datanode port. Does your lsof indicate that the sockets
>>>> are in CLOSE_WAIT?  I had come across an issue like this where that was a
>>>> symptom.
>>>>
>>>> -Sandy
>>>>
>>>>
>>>> On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
>>>> write2kishore@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>>  I am running a date command with YARN's distributed shell example in
>>>>> a loop of 1000 times in this way:
>>>>>
>>>>> yarn jar
>>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>>> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
>>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>>> --shell_command date --num_containers 2
>>>>>
>>>>>
>>>>> Around 730th time or so, I am getting an error in node manager's log
>>>>> saying that it failed to launch container because there are "Too many open
>>>>> files" and when I observe through lsof command,I find that there is one
>>>>> instance of this kind of file is left for each run of Application Master,
>>>>> and it kept growing as I am running it in loop.
>>>>>
>>>>> node1:44871->node1:50010
>>>>>
>>>>> Is this a known issue? Or am I missing doing something? Please help.
>>>>>
>>>>> Note: I am working on hadoop--2.0.0-alpha
>>>>>
>>>>> Thanks,
>>>>> Kishore
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Too many open files error with YARN

Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.
Hi Hemanth,
  Thanks for the reply, I shall try to get that jstack and reply back, I am
also trying to download hadoop-2.0.3-alpha and see if I can overcome this
error.

Thanks,
Kishore




On Thu, Mar 21, 2013 at 3:24 PM, Hemanth Yamijala <yhemanth@thoughtworks.com
> wrote:

> There is a way to confirm if it is the same bug. Can you pick a jstack on
> the process that has established a connection to 50010 and post it here..
>
> Thanks
> hemanth
>
>
> On Thu, Mar 21, 2013 at 12:13 PM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>> Hi Hemanth & Sandy,
>>
>>   Thanks for your reply. Yes, that indicates it is in close wait state,
>> exactly like below:
>>
>> java      30718     dsadm  200u     IPv4         1178376459      0t0
>>    TCP *:50010 (LISTEN)
>> java      31512     dsadm  240u     IPv6         1178391921      0t0
>>    TCP node1:51342->node1:50010 (CLOSE_WAIT)
>>
>> I just checked in at the link
>> https://issues.apache.org/jira/browse/HDFS-3357 it shows 2.0.0-alpha
>> both in affect versions and fix versions.
>>
>> There is another bug 3591, at
>> https://issues.apache.org/jira/browse/HDFS-3591
>>
>> which says it is for backporting 3357 to branch 0.23
>>
>> So, I don't understand whether the fix is really in 2.0.0-alpha, request
>> you to please clarify me.
>>
>> Thanks,
>> Kishore
>>
>>
>>
>>
>>
>> On Thu, Mar 21, 2013 at 9:57 AM, Hemanth Yamijala <
>> yhemanth@thoughtworks.com> wrote:
>>
>>> There was an issue related to hung connections (HDFS-3357). But the JIRA
>>> indicates the fix is available in Hadoop-2.0.0-alpha. Still, would be worth
>>> checking on Sandy's suggestion
>>>
>>>
>>> On Wed, Mar 20, 2013 at 11:09 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>
>>>> Hi Kishore,
>>>>
>>>> 50010 is the datanode port. Does your lsof indicate that the sockets
>>>> are in CLOSE_WAIT?  I had come across an issue like this where that was a
>>>> symptom.
>>>>
>>>> -Sandy
>>>>
>>>>
>>>> On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
>>>> write2kishore@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>>  I am running a date command with YARN's distributed shell example in
>>>>> a loop of 1000 times in this way:
>>>>>
>>>>> yarn jar
>>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>>> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
>>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>>> --shell_command date --num_containers 2
>>>>>
>>>>>
>>>>> Around 730th time or so, I am getting an error in node manager's log
>>>>> saying that it failed to launch container because there are "Too many open
>>>>> files" and when I observe through lsof command,I find that there is one
>>>>> instance of this kind of file is left for each run of Application Master,
>>>>> and it kept growing as I am running it in loop.
>>>>>
>>>>> node1:44871->node1:50010
>>>>>
>>>>> Is this a known issue? Or am I missing doing something? Please help.
>>>>>
>>>>> Note: I am working on hadoop--2.0.0-alpha
>>>>>
>>>>> Thanks,
>>>>> Kishore
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Too many open files error with YARN

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
There is a way to confirm if it is the same bug. Can you pick a jstack on
the process that has established a connection to 50010 and post it here..

Thanks
hemanth


On Thu, Mar 21, 2013 at 12:13 PM, Krishna Kishore Bonagiri <
write2kishore@gmail.com> wrote:

> Hi Hemanth & Sandy,
>
>   Thanks for your reply. Yes, that indicates it is in close wait state,
> exactly like below:
>
> java      30718     dsadm  200u     IPv4         1178376459      0t0
>  TCP *:50010 (LISTEN)
> java      31512     dsadm  240u     IPv6         1178391921      0t0
>  TCP node1:51342->node1:50010 (CLOSE_WAIT)
>
> I just checked in at the link
> https://issues.apache.org/jira/browse/HDFS-3357 it shows 2.0.0-alpha both
> in affect versions and fix versions.
>
> There is another bug 3591, at
> https://issues.apache.org/jira/browse/HDFS-3591
>
> which says it is for backporting 3357 to branch 0.23
>
> So, I don't understand whether the fix is really in 2.0.0-alpha, request
> you to please clarify me.
>
> Thanks,
> Kishore
>
>
>
>
>
> On Thu, Mar 21, 2013 at 9:57 AM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
>
>> There was an issue related to hung connections (HDFS-3357). But the JIRA
>> indicates the fix is available in Hadoop-2.0.0-alpha. Still, would be worth
>> checking on Sandy's suggestion
>>
>>
>> On Wed, Mar 20, 2013 at 11:09 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>
>>> Hi Kishore,
>>>
>>> 50010 is the datanode port. Does your lsof indicate that the sockets are
>>> in CLOSE_WAIT?  I had come across an issue like this where that was a
>>> symptom.
>>>
>>> -Sandy
>>>
>>>
>>> On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
>>> write2kishore@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>>  I am running a date command with YARN's distributed shell example in a
>>>> loop of 1000 times in this way:
>>>>
>>>> yarn jar
>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>> --shell_command date --num_containers 2
>>>>
>>>>
>>>> Around 730th time or so, I am getting an error in node manager's log
>>>> saying that it failed to launch container because there are "Too many open
>>>> files" and when I observe through lsof command,I find that there is one
>>>> instance of this kind of file is left for each run of Application Master,
>>>> and it kept growing as I am running it in loop.
>>>>
>>>> node1:44871->node1:50010
>>>>
>>>> Is this a known issue? Or am I missing doing something? Please help.
>>>>
>>>> Note: I am working on hadoop--2.0.0-alpha
>>>>
>>>> Thanks,
>>>> Kishore
>>>>
>>>
>>>
>>
>

Re: Too many open files error with YARN

Posted by Manoj Babu <ma...@gmail.com>.
In the mean time you can quickly compare the source of the class
with provided patch in the bug.

Cheers!
Manoj.


On Thu, Mar 21, 2013 at 12:13 PM, Krishna Kishore Bonagiri <
write2kishore@gmail.com> wrote:

> Hi Hemanth & Sandy,
>
>   Thanks for your reply. Yes, that indicates it is in close wait state,
> exactly like below:
>
> java      30718     dsadm  200u     IPv4         1178376459      0t0
>  TCP *:50010 (LISTEN)
> java      31512     dsadm  240u     IPv6         1178391921      0t0
>  TCP node1:51342->node1:50010 (CLOSE_WAIT)
>
> I just checked in at the link
> https://issues.apache.org/jira/browse/HDFS-3357 it shows 2.0.0-alpha both
> in affect versions and fix versions.
>
> There is another bug 3591, at
> https://issues.apache.org/jira/browse/HDFS-3591
>
> which says it is for backporting 3357 to branch 0.23
>
> So, I don't understand whether the fix is really in 2.0.0-alpha, request
> you to please clarify me.
>
> Thanks,
> Kishore
>
>
>
>
>
> On Thu, Mar 21, 2013 at 9:57 AM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
>
>> There was an issue related to hung connections (HDFS-3357). But the JIRA
>> indicates the fix is available in Hadoop-2.0.0-alpha. Still, would be worth
>> checking on Sandy's suggestion
>>
>>
>> On Wed, Mar 20, 2013 at 11:09 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>
>>> Hi Kishore,
>>>
>>> 50010 is the datanode port. Does your lsof indicate that the sockets are
>>> in CLOSE_WAIT?  I had come across an issue like this where that was a
>>> symptom.
>>>
>>> -Sandy
>>>
>>>
>>> On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
>>> write2kishore@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>>  I am running a date command with YARN's distributed shell example in a
>>>> loop of 1000 times in this way:
>>>>
>>>> yarn jar
>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>> --shell_command date --num_containers 2
>>>>
>>>>
>>>> Around 730th time or so, I am getting an error in node manager's log
>>>> saying that it failed to launch container because there are "Too many open
>>>> files" and when I observe through lsof command,I find that there is one
>>>> instance of this kind of file is left for each run of Application Master,
>>>> and it kept growing as I am running it in loop.
>>>>
>>>> node1:44871->node1:50010
>>>>
>>>> Is this a known issue? Or am I missing doing something? Please help.
>>>>
>>>> Note: I am working on hadoop--2.0.0-alpha
>>>>
>>>> Thanks,
>>>> Kishore
>>>>
>>>
>>>
>>
>

Re: Too many open files error with YARN

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
There is a way to confirm if it is the same bug. Can you pick a jstack on
the process that has established a connection to 50010 and post it here..

Thanks
hemanth


On Thu, Mar 21, 2013 at 12:13 PM, Krishna Kishore Bonagiri <
write2kishore@gmail.com> wrote:

> Hi Hemanth & Sandy,
>
>   Thanks for your reply. Yes, that indicates it is in close wait state,
> exactly like below:
>
> java      30718     dsadm  200u     IPv4         1178376459      0t0
>  TCP *:50010 (LISTEN)
> java      31512     dsadm  240u     IPv6         1178391921      0t0
>  TCP node1:51342->node1:50010 (CLOSE_WAIT)
>
> I just checked in at the link
> https://issues.apache.org/jira/browse/HDFS-3357 it shows 2.0.0-alpha both
> in affect versions and fix versions.
>
> There is another bug 3591, at
> https://issues.apache.org/jira/browse/HDFS-3591
>
> which says it is for backporting 3357 to branch 0.23
>
> So, I don't understand whether the fix is really in 2.0.0-alpha, request
> you to please clarify me.
>
> Thanks,
> Kishore
>
>
>
>
>
> On Thu, Mar 21, 2013 at 9:57 AM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
>
>> There was an issue related to hung connections (HDFS-3357). But the JIRA
>> indicates the fix is available in Hadoop-2.0.0-alpha. Still, would be worth
>> checking on Sandy's suggestion
>>
>>
>> On Wed, Mar 20, 2013 at 11:09 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>
>>> Hi Kishore,
>>>
>>> 50010 is the datanode port. Does your lsof indicate that the sockets are
>>> in CLOSE_WAIT?  I had come across an issue like this where that was a
>>> symptom.
>>>
>>> -Sandy
>>>
>>>
>>> On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
>>> write2kishore@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>>  I am running a date command with YARN's distributed shell example in a
>>>> loop of 1000 times in this way:
>>>>
>>>> yarn jar
>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>> --shell_command date --num_containers 2
>>>>
>>>>
>>>> Around 730th time or so, I am getting an error in node manager's log
>>>> saying that it failed to launch container because there are "Too many open
>>>> files" and when I observe through lsof command,I find that there is one
>>>> instance of this kind of file is left for each run of Application Master,
>>>> and it kept growing as I am running it in loop.
>>>>
>>>> node1:44871->node1:50010
>>>>
>>>> Is this a known issue? Or am I missing doing something? Please help.
>>>>
>>>> Note: I am working on hadoop--2.0.0-alpha
>>>>
>>>> Thanks,
>>>> Kishore
>>>>
>>>
>>>
>>
>

Re: Too many open files error with YARN

Posted by Manoj Babu <ma...@gmail.com>.
In the mean time you can quickly compare the source of the class
with provided patch in the bug.

Cheers!
Manoj.


On Thu, Mar 21, 2013 at 12:13 PM, Krishna Kishore Bonagiri <
write2kishore@gmail.com> wrote:

> Hi Hemanth & Sandy,
>
>   Thanks for your reply. Yes, that indicates it is in close wait state,
> exactly like below:
>
> java      30718     dsadm  200u     IPv4         1178376459      0t0
>  TCP *:50010 (LISTEN)
> java      31512     dsadm  240u     IPv6         1178391921      0t0
>  TCP node1:51342->node1:50010 (CLOSE_WAIT)
>
> I just checked in at the link
> https://issues.apache.org/jira/browse/HDFS-3357 it shows 2.0.0-alpha both
> in affect versions and fix versions.
>
> There is another bug 3591, at
> https://issues.apache.org/jira/browse/HDFS-3591
>
> which says it is for backporting 3357 to branch 0.23
>
> So, I don't understand whether the fix is really in 2.0.0-alpha, request
> you to please clarify me.
>
> Thanks,
> Kishore
>
>
>
>
>
> On Thu, Mar 21, 2013 at 9:57 AM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
>
>> There was an issue related to hung connections (HDFS-3357). But the JIRA
>> indicates the fix is available in Hadoop-2.0.0-alpha. Still, would be worth
>> checking on Sandy's suggestion
>>
>>
>> On Wed, Mar 20, 2013 at 11:09 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>
>>> Hi Kishore,
>>>
>>> 50010 is the datanode port. Does your lsof indicate that the sockets are
>>> in CLOSE_WAIT?  I had come across an issue like this where that was a
>>> symptom.
>>>
>>> -Sandy
>>>
>>>
>>> On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
>>> write2kishore@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>>  I am running a date command with YARN's distributed shell example in a
>>>> loop of 1000 times in this way:
>>>>
>>>> yarn jar
>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>> --shell_command date --num_containers 2
>>>>
>>>>
>>>> Around 730th time or so, I am getting an error in node manager's log
>>>> saying that it failed to launch container because there are "Too many open
>>>> files" and when I observe through lsof command,I find that there is one
>>>> instance of this kind of file is left for each run of Application Master,
>>>> and it kept growing as I am running it in loop.
>>>>
>>>> node1:44871->node1:50010
>>>>
>>>> Is this a known issue? Or am I missing doing something? Please help.
>>>>
>>>> Note: I am working on hadoop--2.0.0-alpha
>>>>
>>>> Thanks,
>>>> Kishore
>>>>
>>>
>>>
>>
>

Re: Too many open files error with YARN

Posted by Manoj Babu <ma...@gmail.com>.
In the mean time you can quickly compare the source of the class
with provided patch in the bug.

Cheers!
Manoj.


On Thu, Mar 21, 2013 at 12:13 PM, Krishna Kishore Bonagiri <
write2kishore@gmail.com> wrote:

> Hi Hemanth & Sandy,
>
>   Thanks for your reply. Yes, that indicates it is in close wait state,
> exactly like below:
>
> java      30718     dsadm  200u     IPv4         1178376459      0t0
>  TCP *:50010 (LISTEN)
> java      31512     dsadm  240u     IPv6         1178391921      0t0
>  TCP node1:51342->node1:50010 (CLOSE_WAIT)
>
> I just checked in at the link
> https://issues.apache.org/jira/browse/HDFS-3357 it shows 2.0.0-alpha both
> in affect versions and fix versions.
>
> There is another bug 3591, at
> https://issues.apache.org/jira/browse/HDFS-3591
>
> which says it is for backporting 3357 to branch 0.23
>
> So, I don't understand whether the fix is really in 2.0.0-alpha, request
> you to please clarify me.
>
> Thanks,
> Kishore
>
>
>
>
>
> On Thu, Mar 21, 2013 at 9:57 AM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
>
>> There was an issue related to hung connections (HDFS-3357). But the JIRA
>> indicates the fix is available in Hadoop-2.0.0-alpha. Still, would be worth
>> checking on Sandy's suggestion
>>
>>
>> On Wed, Mar 20, 2013 at 11:09 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>
>>> Hi Kishore,
>>>
>>> 50010 is the datanode port. Does your lsof indicate that the sockets are
>>> in CLOSE_WAIT?  I had come across an issue like this where that was a
>>> symptom.
>>>
>>> -Sandy
>>>
>>>
>>> On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
>>> write2kishore@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>>  I am running a date command with YARN's distributed shell example in a
>>>> loop of 1000 times in this way:
>>>>
>>>> yarn jar
>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>> --shell_command date --num_containers 2
>>>>
>>>>
>>>> Around 730th time or so, I am getting an error in node manager's log
>>>> saying that it failed to launch container because there are "Too many open
>>>> files" and when I observe through lsof command,I find that there is one
>>>> instance of this kind of file is left for each run of Application Master,
>>>> and it kept growing as I am running it in loop.
>>>>
>>>> node1:44871->node1:50010
>>>>
>>>> Is this a known issue? Or am I missing doing something? Please help.
>>>>
>>>> Note: I am working on hadoop--2.0.0-alpha
>>>>
>>>> Thanks,
>>>> Kishore
>>>>
>>>
>>>
>>
>

Re: Too many open files error with YARN

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
There is a way to confirm if it is the same bug. Can you pick a jstack on
the process that has established a connection to 50010 and post it here..

Thanks
hemanth


On Thu, Mar 21, 2013 at 12:13 PM, Krishna Kishore Bonagiri <
write2kishore@gmail.com> wrote:

> Hi Hemanth & Sandy,
>
>   Thanks for your reply. Yes, that indicates it is in close wait state,
> exactly like below:
>
> java      30718     dsadm  200u     IPv4         1178376459      0t0
>  TCP *:50010 (LISTEN)
> java      31512     dsadm  240u     IPv6         1178391921      0t0
>  TCP node1:51342->node1:50010 (CLOSE_WAIT)
>
> I just checked in at the link
> https://issues.apache.org/jira/browse/HDFS-3357 it shows 2.0.0-alpha both
> in affect versions and fix versions.
>
> There is another bug 3591, at
> https://issues.apache.org/jira/browse/HDFS-3591
>
> which says it is for backporting 3357 to branch 0.23
>
> So, I don't understand whether the fix is really in 2.0.0-alpha, request
> you to please clarify me.
>
> Thanks,
> Kishore
>
>
>
>
>
> On Thu, Mar 21, 2013 at 9:57 AM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
>
>> There was an issue related to hung connections (HDFS-3357). But the JIRA
>> indicates the fix is available in Hadoop-2.0.0-alpha. Still, would be worth
>> checking on Sandy's suggestion
>>
>>
>> On Wed, Mar 20, 2013 at 11:09 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>
>>> Hi Kishore,
>>>
>>> 50010 is the datanode port. Does your lsof indicate that the sockets are
>>> in CLOSE_WAIT?  I had come across an issue like this where that was a
>>> symptom.
>>>
>>> -Sandy
>>>
>>>
>>> On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
>>> write2kishore@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>>  I am running a date command with YARN's distributed shell example in a
>>>> loop of 1000 times in this way:
>>>>
>>>> yarn jar
>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>> --shell_command date --num_containers 2
>>>>
>>>>
>>>> Around 730th time or so, I am getting an error in node manager's log
>>>> saying that it failed to launch container because there are "Too many open
>>>> files" and when I observe through lsof command,I find that there is one
>>>> instance of this kind of file is left for each run of Application Master,
>>>> and it kept growing as I am running it in loop.
>>>>
>>>> node1:44871->node1:50010
>>>>
>>>> Is this a known issue? Or am I missing doing something? Please help.
>>>>
>>>> Note: I am working on hadoop--2.0.0-alpha
>>>>
>>>> Thanks,
>>>> Kishore
>>>>
>>>
>>>
>>
>

Re: Too many open files error with YARN

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
There is a way to confirm if it is the same bug. Can you pick a jstack on
the process that has established a connection to 50010 and post it here..

Thanks
hemanth


On Thu, Mar 21, 2013 at 12:13 PM, Krishna Kishore Bonagiri <
write2kishore@gmail.com> wrote:

> Hi Hemanth & Sandy,
>
>   Thanks for your reply. Yes, that indicates it is in close wait state,
> exactly like below:
>
> java      30718     dsadm  200u     IPv4         1178376459      0t0
>  TCP *:50010 (LISTEN)
> java      31512     dsadm  240u     IPv6         1178391921      0t0
>  TCP node1:51342->node1:50010 (CLOSE_WAIT)
>
> I just checked in at the link
> https://issues.apache.org/jira/browse/HDFS-3357 it shows 2.0.0-alpha both
> in affect versions and fix versions.
>
> There is another bug 3591, at
> https://issues.apache.org/jira/browse/HDFS-3591
>
> which says it is for backporting 3357 to branch 0.23
>
> So, I don't understand whether the fix is really in 2.0.0-alpha, request
> you to please clarify me.
>
> Thanks,
> Kishore
>
>
>
>
>
> On Thu, Mar 21, 2013 at 9:57 AM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
>
>> There was an issue related to hung connections (HDFS-3357). But the JIRA
>> indicates the fix is available in Hadoop-2.0.0-alpha. Still, would be worth
>> checking on Sandy's suggestion
>>
>>
>> On Wed, Mar 20, 2013 at 11:09 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>
>>> Hi Kishore,
>>>
>>> 50010 is the datanode port. Does your lsof indicate that the sockets are
>>> in CLOSE_WAIT?  I had come across an issue like this where that was a
>>> symptom.
>>>
>>> -Sandy
>>>
>>>
>>> On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
>>> write2kishore@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>>  I am running a date command with YARN's distributed shell example in a
>>>> loop of 1000 times in this way:
>>>>
>>>> yarn jar
>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>> --shell_command date --num_containers 2
>>>>
>>>>
>>>> Around 730th time or so, I am getting an error in node manager's log
>>>> saying that it failed to launch container because there are "Too many open
>>>> files" and when I observe through lsof command,I find that there is one
>>>> instance of this kind of file is left for each run of Application Master,
>>>> and it kept growing as I am running it in loop.
>>>>
>>>> node1:44871->node1:50010
>>>>
>>>> Is this a known issue? Or am I missing doing something? Please help.
>>>>
>>>> Note: I am working on hadoop--2.0.0-alpha
>>>>
>>>> Thanks,
>>>> Kishore
>>>>
>>>
>>>
>>
>

Re: Too many open files error with YARN

Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.
Hi Hemanth & Sandy,

  Thanks for your reply. Yes, that indicates it is in close wait state,
exactly like below:

java      30718     dsadm  200u     IPv4         1178376459      0t0
 TCP *:50010 (LISTEN)
java      31512     dsadm  240u     IPv6         1178391921      0t0
 TCP node1:51342->node1:50010 (CLOSE_WAIT)

I just checked in at the link
https://issues.apache.org/jira/browse/HDFS-3357 it shows 2.0.0-alpha both
in affect versions and fix versions.

There is another bug 3591, at
https://issues.apache.org/jira/browse/HDFS-3591

which says it is for backporting 3357 to branch 0.23

So, I don't understand whether the fix is really in 2.0.0-alpha, request
you to please clarify me.

Thanks,
Kishore





On Thu, Mar 21, 2013 at 9:57 AM, Hemanth Yamijala <yhemanth@thoughtworks.com
> wrote:

> There was an issue related to hung connections (HDFS-3357). But the JIRA
> indicates the fix is available in Hadoop-2.0.0-alpha. Still, would be worth
> checking on Sandy's suggestion
>
>
> On Wed, Mar 20, 2013 at 11:09 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>
>> Hi Kishore,
>>
>> 50010 is the datanode port. Does your lsof indicate that the sockets are
>> in CLOSE_WAIT?  I had come across an issue like this where that was a
>> symptom.
>>
>> -Sandy
>>
>>
>> On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
>> write2kishore@gmail.com> wrote:
>>
>>> Hi,
>>>
>>>  I am running a date command with YARN's distributed shell example in a
>>> loop of 1000 times in this way:
>>>
>>> yarn jar
>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>> --shell_command date --num_containers 2
>>>
>>>
>>> Around 730th time or so, I am getting an error in node manager's log
>>> saying that it failed to launch container because there are "Too many open
>>> files" and when I observe through lsof command,I find that there is one
>>> instance of this kind of file is left for each run of Application Master,
>>> and it kept growing as I am running it in loop.
>>>
>>> node1:44871->node1:50010
>>>
>>> Is this a known issue? Or am I missing doing something? Please help.
>>>
>>> Note: I am working on hadoop--2.0.0-alpha
>>>
>>> Thanks,
>>> Kishore
>>>
>>
>>
>

Re: Too many open files error with YARN

Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.
Hi Hemanth & Sandy,

  Thanks for your reply. Yes, that indicates it is in close wait state,
exactly like below:

java      30718     dsadm  200u     IPv4         1178376459      0t0
 TCP *:50010 (LISTEN)
java      31512     dsadm  240u     IPv6         1178391921      0t0
 TCP node1:51342->node1:50010 (CLOSE_WAIT)

I just checked in at the link
https://issues.apache.org/jira/browse/HDFS-3357 it shows 2.0.0-alpha both
in affect versions and fix versions.

There is another bug 3591, at
https://issues.apache.org/jira/browse/HDFS-3591

which says it is for backporting 3357 to branch 0.23

So, I don't understand whether the fix is really in 2.0.0-alpha, request
you to please clarify me.

Thanks,
Kishore





On Thu, Mar 21, 2013 at 9:57 AM, Hemanth Yamijala <yhemanth@thoughtworks.com
> wrote:

> There was an issue related to hung connections (HDFS-3357). But the JIRA
> indicates the fix is available in Hadoop-2.0.0-alpha. Still, would be worth
> checking on Sandy's suggestion
>
>
> On Wed, Mar 20, 2013 at 11:09 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>
>> Hi Kishore,
>>
>> 50010 is the datanode port. Does your lsof indicate that the sockets are
>> in CLOSE_WAIT?  I had come across an issue like this where that was a
>> symptom.
>>
>> -Sandy
>>
>>
>> On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
>> write2kishore@gmail.com> wrote:
>>
>>> Hi,
>>>
>>>  I am running a date command with YARN's distributed shell example in a
>>> loop of 1000 times in this way:
>>>
>>> yarn jar
>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>> --shell_command date --num_containers 2
>>>
>>>
>>> Around 730th time or so, I am getting an error in node manager's log
>>> saying that it failed to launch container because there are "Too many open
>>> files" and when I observe through lsof command,I find that there is one
>>> instance of this kind of file is left for each run of Application Master,
>>> and it kept growing as I am running it in loop.
>>>
>>> node1:44871->node1:50010
>>>
>>> Is this a known issue? Or am I missing doing something? Please help.
>>>
>>> Note: I am working on hadoop--2.0.0-alpha
>>>
>>> Thanks,
>>> Kishore
>>>
>>
>>
>

Re: Too many open files error with YARN

Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.
Hi Hemanth & Sandy,

  Thanks for your reply. Yes, that indicates it is in close wait state,
exactly like below:

java      30718     dsadm  200u     IPv4         1178376459      0t0
 TCP *:50010 (LISTEN)
java      31512     dsadm  240u     IPv6         1178391921      0t0
 TCP node1:51342->node1:50010 (CLOSE_WAIT)

I just checked in at the link
https://issues.apache.org/jira/browse/HDFS-3357 it shows 2.0.0-alpha both
in affect versions and fix versions.

There is another bug 3591, at
https://issues.apache.org/jira/browse/HDFS-3591

which says it is for backporting 3357 to branch 0.23

So, I don't understand whether the fix is really in 2.0.0-alpha, request
you to please clarify me.

Thanks,
Kishore





On Thu, Mar 21, 2013 at 9:57 AM, Hemanth Yamijala <yhemanth@thoughtworks.com
> wrote:

> There was an issue related to hung connections (HDFS-3357). But the JIRA
> indicates the fix is available in Hadoop-2.0.0-alpha. Still, would be worth
> checking on Sandy's suggestion
>
>
> On Wed, Mar 20, 2013 at 11:09 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>
>> Hi Kishore,
>>
>> 50010 is the datanode port. Does your lsof indicate that the sockets are
>> in CLOSE_WAIT?  I had come across an issue like this where that was a
>> symptom.
>>
>> -Sandy
>>
>>
>> On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
>> write2kishore@gmail.com> wrote:
>>
>>> Hi,
>>>
>>>  I am running a date command with YARN's distributed shell example in a
>>> loop of 1000 times in this way:
>>>
>>> yarn jar
>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>> --shell_command date --num_containers 2
>>>
>>>
>>> Around 730th time or so, I am getting an error in node manager's log
>>> saying that it failed to launch container because there are "Too many open
>>> files" and when I observe through lsof command,I find that there is one
>>> instance of this kind of file is left for each run of Application Master,
>>> and it kept growing as I am running it in loop.
>>>
>>> node1:44871->node1:50010
>>>
>>> Is this a known issue? Or am I missing doing something? Please help.
>>>
>>> Note: I am working on hadoop--2.0.0-alpha
>>>
>>> Thanks,
>>> Kishore
>>>
>>
>>
>

Re: Too many open files error with YARN

Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.
Hi Hemanth & Sandy,

  Thanks for your reply. Yes, that indicates it is in close wait state,
exactly like below:

java      30718     dsadm  200u     IPv4         1178376459      0t0
 TCP *:50010 (LISTEN)
java      31512     dsadm  240u     IPv6         1178391921      0t0
 TCP node1:51342->node1:50010 (CLOSE_WAIT)

I just checked in at the link
https://issues.apache.org/jira/browse/HDFS-3357 it shows 2.0.0-alpha both
in affect versions and fix versions.

There is another bug 3591, at
https://issues.apache.org/jira/browse/HDFS-3591

which says it is for backporting 3357 to branch 0.23

So, I don't understand whether the fix is really in 2.0.0-alpha, request
you to please clarify me.

Thanks,
Kishore





On Thu, Mar 21, 2013 at 9:57 AM, Hemanth Yamijala <yhemanth@thoughtworks.com
> wrote:

> There was an issue related to hung connections (HDFS-3357). But the JIRA
> indicates the fix is available in Hadoop-2.0.0-alpha. Still, would be worth
> checking on Sandy's suggestion
>
>
> On Wed, Mar 20, 2013 at 11:09 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>
>> Hi Kishore,
>>
>> 50010 is the datanode port. Does your lsof indicate that the sockets are
>> in CLOSE_WAIT?  I had come across an issue like this where that was a
>> symptom.
>>
>> -Sandy
>>
>>
>> On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
>> write2kishore@gmail.com> wrote:
>>
>>> Hi,
>>>
>>>  I am running a date command with YARN's distributed shell example in a
>>> loop of 1000 times in this way:
>>>
>>> yarn jar
>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>> --shell_command date --num_containers 2
>>>
>>>
>>> Around 730th time or so, I am getting an error in node manager's log
>>> saying that it failed to launch container because there are "Too many open
>>> files" and when I observe through lsof command,I find that there is one
>>> instance of this kind of file is left for each run of Application Master,
>>> and it kept growing as I am running it in loop.
>>>
>>> node1:44871->node1:50010
>>>
>>> Is this a known issue? Or am I missing doing something? Please help.
>>>
>>> Note: I am working on hadoop--2.0.0-alpha
>>>
>>> Thanks,
>>> Kishore
>>>
>>
>>
>

Re: Too many open files error with YARN

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
There was an issue related to hung connections (HDFS-3357). But the JIRA
indicates the fix is available in Hadoop-2.0.0-alpha. Still, would be worth
checking on Sandy's suggestion


On Wed, Mar 20, 2013 at 11:09 PM, Sandy Ryza <sa...@cloudera.com>wrote:

> Hi Kishore,
>
> 50010 is the datanode port. Does your lsof indicate that the sockets are
> in CLOSE_WAIT?  I had come across an issue like this where that was a
> symptom.
>
> -Sandy
>
>
> On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>> Hi,
>>
>>  I am running a date command with YARN's distributed shell example in a
>> loop of 1000 times in this way:
>>
>> yarn jar
>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>> --shell_command date --num_containers 2
>>
>>
>> Around 730th time or so, I am getting an error in node manager's log
>> saying that it failed to launch container because there are "Too many open
>> files" and when I observe through lsof command,I find that there is one
>> instance of this kind of file is left for each run of Application Master,
>> and it kept growing as I am running it in loop.
>>
>> node1:44871->node1:50010
>>
>> Is this a known issue? Or am I missing doing something? Please help.
>>
>> Note: I am working on hadoop--2.0.0-alpha
>>
>> Thanks,
>> Kishore
>>
>
>

Re: Too many open files error with YARN

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
There was an issue related to hung connections (HDFS-3357). But the JIRA
indicates the fix is available in Hadoop-2.0.0-alpha. Still, would be worth
checking on Sandy's suggestion


On Wed, Mar 20, 2013 at 11:09 PM, Sandy Ryza <sa...@cloudera.com>wrote:

> Hi Kishore,
>
> 50010 is the datanode port. Does your lsof indicate that the sockets are
> in CLOSE_WAIT?  I had come across an issue like this where that was a
> symptom.
>
> -Sandy
>
>
> On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>> Hi,
>>
>>  I am running a date command with YARN's distributed shell example in a
>> loop of 1000 times in this way:
>>
>> yarn jar
>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>> --shell_command date --num_containers 2
>>
>>
>> Around 730th time or so, I am getting an error in node manager's log
>> saying that it failed to launch container because there are "Too many open
>> files" and when I observe through lsof command,I find that there is one
>> instance of this kind of file is left for each run of Application Master,
>> and it kept growing as I am running it in loop.
>>
>> node1:44871->node1:50010
>>
>> Is this a known issue? Or am I missing doing something? Please help.
>>
>> Note: I am working on hadoop--2.0.0-alpha
>>
>> Thanks,
>> Kishore
>>
>
>

Re: Too many open files error with YARN

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
There was an issue related to hung connections (HDFS-3357). But the JIRA
indicates the fix is available in Hadoop-2.0.0-alpha. Still, would be worth
checking on Sandy's suggestion


On Wed, Mar 20, 2013 at 11:09 PM, Sandy Ryza <sa...@cloudera.com>wrote:

> Hi Kishore,
>
> 50010 is the datanode port. Does your lsof indicate that the sockets are
> in CLOSE_WAIT?  I had come across an issue like this where that was a
> symptom.
>
> -Sandy
>
>
> On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>> Hi,
>>
>>  I am running a date command with YARN's distributed shell example in a
>> loop of 1000 times in this way:
>>
>> yarn jar
>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>> --shell_command date --num_containers 2
>>
>>
>> Around 730th time or so, I am getting an error in node manager's log
>> saying that it failed to launch container because there are "Too many open
>> files" and when I observe through lsof command,I find that there is one
>> instance of this kind of file is left for each run of Application Master,
>> and it kept growing as I am running it in loop.
>>
>> node1:44871->node1:50010
>>
>> Is this a known issue? Or am I missing doing something? Please help.
>>
>> Note: I am working on hadoop--2.0.0-alpha
>>
>> Thanks,
>> Kishore
>>
>
>

Re: Too many open files error with YARN

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
There was an issue related to hung connections (HDFS-3357). But the JIRA
indicates the fix is available in Hadoop-2.0.0-alpha. Still, would be worth
checking on Sandy's suggestion


On Wed, Mar 20, 2013 at 11:09 PM, Sandy Ryza <sa...@cloudera.com>wrote:

> Hi Kishore,
>
> 50010 is the datanode port. Does your lsof indicate that the sockets are
> in CLOSE_WAIT?  I had come across an issue like this where that was a
> symptom.
>
> -Sandy
>
>
> On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>> Hi,
>>
>>  I am running a date command with YARN's distributed shell example in a
>> loop of 1000 times in this way:
>>
>> yarn jar
>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>> --shell_command date --num_containers 2
>>
>>
>> Around 730th time or so, I am getting an error in node manager's log
>> saying that it failed to launch container because there are "Too many open
>> files" and when I observe through lsof command,I find that there is one
>> instance of this kind of file is left for each run of Application Master,
>> and it kept growing as I am running it in loop.
>>
>> node1:44871->node1:50010
>>
>> Is this a known issue? Or am I missing doing something? Please help.
>>
>> Note: I am working on hadoop--2.0.0-alpha
>>
>> Thanks,
>> Kishore
>>
>
>

Re: Too many open files error with YARN

Posted by Sandy Ryza <sa...@cloudera.com>.
Hi Kishore,

50010 is the datanode port. Does your lsof indicate that the sockets are in
CLOSE_WAIT?  I had come across an issue like this where that was a symptom.

-Sandy

On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
write2kishore@gmail.com> wrote:

> Hi,
>
>  I am running a date command with YARN's distributed shell example in a
> loop of 1000 times in this way:
>
> yarn jar
> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
> --shell_command date --num_containers 2
>
>
> Around 730th time or so, I am getting an error in node manager's log
> saying that it failed to launch container because there are "Too many open
> files" and when I observe through lsof command,I find that there is one
> instance of this kind of file is left for each run of Application Master,
> and it kept growing as I am running it in loop.
>
> node1:44871->node1:50010
>
> Is this a known issue? Or am I missing doing something? Please help.
>
> Note: I am working on hadoop--2.0.0-alpha
>
> Thanks,
> Kishore
>

Re: Too many open files error with YARN

Posted by Sandy Ryza <sa...@cloudera.com>.
Hi Kishore,

50010 is the datanode port. Does your lsof indicate that the sockets are in
CLOSE_WAIT?  I had come across an issue like this where that was a symptom.

-Sandy

On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
write2kishore@gmail.com> wrote:

> Hi,
>
>  I am running a date command with YARN's distributed shell example in a
> loop of 1000 times in this way:
>
> yarn jar
> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
> --shell_command date --num_containers 2
>
>
> Around 730th time or so, I am getting an error in node manager's log
> saying that it failed to launch container because there are "Too many open
> files" and when I observe through lsof command,I find that there is one
> instance of this kind of file is left for each run of Application Master,
> and it kept growing as I am running it in loop.
>
> node1:44871->node1:50010
>
> Is this a known issue? Or am I missing doing something? Please help.
>
> Note: I am working on hadoop--2.0.0-alpha
>
> Thanks,
> Kishore
>

Re: Too many open files error with YARN

Posted by Sandy Ryza <sa...@cloudera.com>.
Hi Kishore,

50010 is the datanode port. Does your lsof indicate that the sockets are in
CLOSE_WAIT?  I had come across an issue like this where that was a symptom.

-Sandy

On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
write2kishore@gmail.com> wrote:

> Hi,
>
>  I am running a date command with YARN's distributed shell example in a
> loop of 1000 times in this way:
>
> yarn jar
> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
> --shell_command date --num_containers 2
>
>
> Around 730th time or so, I am getting an error in node manager's log
> saying that it failed to launch container because there are "Too many open
> files" and when I observe through lsof command,I find that there is one
> instance of this kind of file is left for each run of Application Master,
> and it kept growing as I am running it in loop.
>
> node1:44871->node1:50010
>
> Is this a known issue? Or am I missing doing something? Please help.
>
> Note: I am working on hadoop--2.0.0-alpha
>
> Thanks,
> Kishore
>

Re: Too many open files error with YARN

Posted by Sandy Ryza <sa...@cloudera.com>.
Hi Kishore,

50010 is the datanode port. Does your lsof indicate that the sockets are in
CLOSE_WAIT?  I had come across an issue like this where that was a symptom.

-Sandy

On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
write2kishore@gmail.com> wrote:

> Hi,
>
>  I am running a date command with YARN's distributed shell example in a
> loop of 1000 times in this way:
>
> yarn jar
> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
> --shell_command date --num_containers 2
>
>
> Around 730th time or so, I am getting an error in node manager's log
> saying that it failed to launch container because there are "Too many open
> files" and when I observe through lsof command,I find that there is one
> instance of this kind of file is left for each run of Application Master,
> and it kept growing as I am running it in loop.
>
> node1:44871->node1:50010
>
> Is this a known issue? Or am I missing doing something? Please help.
>
> Note: I am working on hadoop--2.0.0-alpha
>
> Thanks,
> Kishore
>