You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Krishna Kishore Bonagiri <wr...@gmail.com> on 2013/03/20 12:24:03 UTC
Too many open files error with YARN
Hi,
I am running a date command with YARN's distributed shell example in a
loop of 1000 times in this way:
yarn jar
/home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
org.apache.hadoop.yarn.applications.distributedshell.Client --jar
/home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
--shell_command date --num_containers 2
Around 730th time or so, I am getting an error in node manager's log saying
that it failed to launch container because there are "Too many open files"
and when I observe through lsof command,I find that there is one instance
of this kind of file is left for each run of Application Master, and it
kept growing as I am running it in loop.
node1:44871->node1:50010
Is this a known issue? Or am I missing doing something? Please help.
Note: I am working on hadoop--2.0.0-alpha
Thanks,
Kishore
Re: Too many open files error with YARN
Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.
Thanks Manoj for your suggestion, I have just compared one of the files in
the patch and that is not there in my version of the 2.0.0-alpha code. So,
I don't have that fix.
Thanks,
Ksihore
On Thu, Mar 21, 2013 at 1:55 PM, Manoj Babu <ma...@gmail.com> wrote:
> In the mean time you can quickly compare the source of the class
> with provided patch in the bug.
>
> Cheers!
> Manoj.
>
>
> On Thu, Mar 21, 2013 at 12:13 PM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>> Hi Hemanth & Sandy,
>>
>> Thanks for your reply. Yes, that indicates it is in close wait state,
>> exactly like below:
>>
>> java 30718 dsadm 200u IPv4 1178376459 0t0
>> TCP *:50010 (LISTEN)
>> java 31512 dsadm 240u IPv6 1178391921 0t0
>> TCP node1:51342->node1:50010 (CLOSE_WAIT)
>>
>> I just checked in at the link
>> https://issues.apache.org/jira/browse/HDFS-3357 it shows 2.0.0-alpha
>> both in affect versions and fix versions.
>>
>> There is another bug 3591, at
>> https://issues.apache.org/jira/browse/HDFS-3591
>>
>> which says it is for backporting 3357 to branch 0.23
>>
>> So, I don't understand whether the fix is really in 2.0.0-alpha, request
>> you to please clarify me.
>>
>> Thanks,
>> Kishore
>>
>>
>>
>>
>>
>> On Thu, Mar 21, 2013 at 9:57 AM, Hemanth Yamijala <
>> yhemanth@thoughtworks.com> wrote:
>>
>>> There was an issue related to hung connections (HDFS-3357). But the JIRA
>>> indicates the fix is available in Hadoop-2.0.0-alpha. Still, would be worth
>>> checking on Sandy's suggestion
>>>
>>>
>>> On Wed, Mar 20, 2013 at 11:09 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>
>>>> Hi Kishore,
>>>>
>>>> 50010 is the datanode port. Does your lsof indicate that the sockets
>>>> are in CLOSE_WAIT? I had come across an issue like this where that was a
>>>> symptom.
>>>>
>>>> -Sandy
>>>>
>>>>
>>>> On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
>>>> write2kishore@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am running a date command with YARN's distributed shell example in
>>>>> a loop of 1000 times in this way:
>>>>>
>>>>> yarn jar
>>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>>> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
>>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>>> --shell_command date --num_containers 2
>>>>>
>>>>>
>>>>> Around 730th time or so, I am getting an error in node manager's log
>>>>> saying that it failed to launch container because there are "Too many open
>>>>> files" and when I observe through lsof command,I find that there is one
>>>>> instance of this kind of file is left for each run of Application Master,
>>>>> and it kept growing as I am running it in loop.
>>>>>
>>>>> node1:44871->node1:50010
>>>>>
>>>>> Is this a known issue? Or am I missing doing something? Please help.
>>>>>
>>>>> Note: I am working on hadoop--2.0.0-alpha
>>>>>
>>>>> Thanks,
>>>>> Kishore
>>>>>
>>>>
>>>>
>>>
>>
>
Re: Too many open files error with YARN
Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.
Thanks Manoj for your suggestion, I have just compared one of the files in
the patch and that is not there in my version of the 2.0.0-alpha code. So,
I don't have that fix.
Thanks,
Ksihore
On Thu, Mar 21, 2013 at 1:55 PM, Manoj Babu <ma...@gmail.com> wrote:
> In the mean time you can quickly compare the source of the class
> with provided patch in the bug.
>
> Cheers!
> Manoj.
>
>
> On Thu, Mar 21, 2013 at 12:13 PM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>> Hi Hemanth & Sandy,
>>
>> Thanks for your reply. Yes, that indicates it is in close wait state,
>> exactly like below:
>>
>> java 30718 dsadm 200u IPv4 1178376459 0t0
>> TCP *:50010 (LISTEN)
>> java 31512 dsadm 240u IPv6 1178391921 0t0
>> TCP node1:51342->node1:50010 (CLOSE_WAIT)
>>
>> I just checked in at the link
>> https://issues.apache.org/jira/browse/HDFS-3357 it shows 2.0.0-alpha
>> both in affect versions and fix versions.
>>
>> There is another bug 3591, at
>> https://issues.apache.org/jira/browse/HDFS-3591
>>
>> which says it is for backporting 3357 to branch 0.23
>>
>> So, I don't understand whether the fix is really in 2.0.0-alpha, request
>> you to please clarify me.
>>
>> Thanks,
>> Kishore
>>
>>
>>
>>
>>
>> On Thu, Mar 21, 2013 at 9:57 AM, Hemanth Yamijala <
>> yhemanth@thoughtworks.com> wrote:
>>
>>> There was an issue related to hung connections (HDFS-3357). But the JIRA
>>> indicates the fix is available in Hadoop-2.0.0-alpha. Still, would be worth
>>> checking on Sandy's suggestion
>>>
>>>
>>> On Wed, Mar 20, 2013 at 11:09 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>
>>>> Hi Kishore,
>>>>
>>>> 50010 is the datanode port. Does your lsof indicate that the sockets
>>>> are in CLOSE_WAIT? I had come across an issue like this where that was a
>>>> symptom.
>>>>
>>>> -Sandy
>>>>
>>>>
>>>> On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
>>>> write2kishore@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am running a date command with YARN's distributed shell example in
>>>>> a loop of 1000 times in this way:
>>>>>
>>>>> yarn jar
>>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>>> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
>>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>>> --shell_command date --num_containers 2
>>>>>
>>>>>
>>>>> Around 730th time or so, I am getting an error in node manager's log
>>>>> saying that it failed to launch container because there are "Too many open
>>>>> files" and when I observe through lsof command,I find that there is one
>>>>> instance of this kind of file is left for each run of Application Master,
>>>>> and it kept growing as I am running it in loop.
>>>>>
>>>>> node1:44871->node1:50010
>>>>>
>>>>> Is this a known issue? Or am I missing doing something? Please help.
>>>>>
>>>>> Note: I am working on hadoop--2.0.0-alpha
>>>>>
>>>>> Thanks,
>>>>> Kishore
>>>>>
>>>>
>>>>
>>>
>>
>
Re: Too many open files error with YARN
Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.
Thanks Manoj for your suggestion, I have just compared one of the files in
the patch and that is not there in my version of the 2.0.0-alpha code. So,
I don't have that fix.
Thanks,
Ksihore
On Thu, Mar 21, 2013 at 1:55 PM, Manoj Babu <ma...@gmail.com> wrote:
> In the mean time you can quickly compare the source of the class
> with provided patch in the bug.
>
> Cheers!
> Manoj.
>
>
> On Thu, Mar 21, 2013 at 12:13 PM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>> Hi Hemanth & Sandy,
>>
>> Thanks for your reply. Yes, that indicates it is in close wait state,
>> exactly like below:
>>
>> java 30718 dsadm 200u IPv4 1178376459 0t0
>> TCP *:50010 (LISTEN)
>> java 31512 dsadm 240u IPv6 1178391921 0t0
>> TCP node1:51342->node1:50010 (CLOSE_WAIT)
>>
>> I just checked in at the link
>> https://issues.apache.org/jira/browse/HDFS-3357 it shows 2.0.0-alpha
>> both in affect versions and fix versions.
>>
>> There is another bug 3591, at
>> https://issues.apache.org/jira/browse/HDFS-3591
>>
>> which says it is for backporting 3357 to branch 0.23
>>
>> So, I don't understand whether the fix is really in 2.0.0-alpha, request
>> you to please clarify me.
>>
>> Thanks,
>> Kishore
>>
>>
>>
>>
>>
>> On Thu, Mar 21, 2013 at 9:57 AM, Hemanth Yamijala <
>> yhemanth@thoughtworks.com> wrote:
>>
>>> There was an issue related to hung connections (HDFS-3357). But the JIRA
>>> indicates the fix is available in Hadoop-2.0.0-alpha. Still, would be worth
>>> checking on Sandy's suggestion
>>>
>>>
>>> On Wed, Mar 20, 2013 at 11:09 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>
>>>> Hi Kishore,
>>>>
>>>> 50010 is the datanode port. Does your lsof indicate that the sockets
>>>> are in CLOSE_WAIT? I had come across an issue like this where that was a
>>>> symptom.
>>>>
>>>> -Sandy
>>>>
>>>>
>>>> On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
>>>> write2kishore@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am running a date command with YARN's distributed shell example in
>>>>> a loop of 1000 times in this way:
>>>>>
>>>>> yarn jar
>>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>>> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
>>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>>> --shell_command date --num_containers 2
>>>>>
>>>>>
>>>>> Around 730th time or so, I am getting an error in node manager's log
>>>>> saying that it failed to launch container because there are "Too many open
>>>>> files" and when I observe through lsof command,I find that there is one
>>>>> instance of this kind of file is left for each run of Application Master,
>>>>> and it kept growing as I am running it in loop.
>>>>>
>>>>> node1:44871->node1:50010
>>>>>
>>>>> Is this a known issue? Or am I missing doing something? Please help.
>>>>>
>>>>> Note: I am working on hadoop--2.0.0-alpha
>>>>>
>>>>> Thanks,
>>>>> Kishore
>>>>>
>>>>
>>>>
>>>
>>
>
Re: Too many open files error with YARN
Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.
Thanks Manoj for your suggestion, I have just compared one of the files in
the patch and that is not there in my version of the 2.0.0-alpha code. So,
I don't have that fix.
Thanks,
Ksihore
On Thu, Mar 21, 2013 at 1:55 PM, Manoj Babu <ma...@gmail.com> wrote:
> In the mean time you can quickly compare the source of the class
> with provided patch in the bug.
>
> Cheers!
> Manoj.
>
>
> On Thu, Mar 21, 2013 at 12:13 PM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>> Hi Hemanth & Sandy,
>>
>> Thanks for your reply. Yes, that indicates it is in close wait state,
>> exactly like below:
>>
>> java 30718 dsadm 200u IPv4 1178376459 0t0
>> TCP *:50010 (LISTEN)
>> java 31512 dsadm 240u IPv6 1178391921 0t0
>> TCP node1:51342->node1:50010 (CLOSE_WAIT)
>>
>> I just checked in at the link
>> https://issues.apache.org/jira/browse/HDFS-3357 it shows 2.0.0-alpha
>> both in affect versions and fix versions.
>>
>> There is another bug 3591, at
>> https://issues.apache.org/jira/browse/HDFS-3591
>>
>> which says it is for backporting 3357 to branch 0.23
>>
>> So, I don't understand whether the fix is really in 2.0.0-alpha, request
>> you to please clarify me.
>>
>> Thanks,
>> Kishore
>>
>>
>>
>>
>>
>> On Thu, Mar 21, 2013 at 9:57 AM, Hemanth Yamijala <
>> yhemanth@thoughtworks.com> wrote:
>>
>>> There was an issue related to hung connections (HDFS-3357). But the JIRA
>>> indicates the fix is available in Hadoop-2.0.0-alpha. Still, would be worth
>>> checking on Sandy's suggestion
>>>
>>>
>>> On Wed, Mar 20, 2013 at 11:09 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>
>>>> Hi Kishore,
>>>>
>>>> 50010 is the datanode port. Does your lsof indicate that the sockets
>>>> are in CLOSE_WAIT? I had come across an issue like this where that was a
>>>> symptom.
>>>>
>>>> -Sandy
>>>>
>>>>
>>>> On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
>>>> write2kishore@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am running a date command with YARN's distributed shell example in
>>>>> a loop of 1000 times in this way:
>>>>>
>>>>> yarn jar
>>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>>> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
>>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>>> --shell_command date --num_containers 2
>>>>>
>>>>>
>>>>> Around 730th time or so, I am getting an error in node manager's log
>>>>> saying that it failed to launch container because there are "Too many open
>>>>> files" and when I observe through lsof command,I find that there is one
>>>>> instance of this kind of file is left for each run of Application Master,
>>>>> and it kept growing as I am running it in loop.
>>>>>
>>>>> node1:44871->node1:50010
>>>>>
>>>>> Is this a known issue? Or am I missing doing something? Please help.
>>>>>
>>>>> Note: I am working on hadoop--2.0.0-alpha
>>>>>
>>>>> Thanks,
>>>>> Kishore
>>>>>
>>>>
>>>>
>>>
>>
>
Re: Too many open files error with YARN
Posted by Manoj Babu <ma...@gmail.com>.
In the mean time you can quickly compare the source of the class
with provided patch in the bug.
Cheers!
Manoj.
On Thu, Mar 21, 2013 at 12:13 PM, Krishna Kishore Bonagiri <
write2kishore@gmail.com> wrote:
> Hi Hemanth & Sandy,
>
> Thanks for your reply. Yes, that indicates it is in close wait state,
> exactly like below:
>
> java 30718 dsadm 200u IPv4 1178376459 0t0
> TCP *:50010 (LISTEN)
> java 31512 dsadm 240u IPv6 1178391921 0t0
> TCP node1:51342->node1:50010 (CLOSE_WAIT)
>
> I just checked in at the link
> https://issues.apache.org/jira/browse/HDFS-3357 it shows 2.0.0-alpha both
> in affect versions and fix versions.
>
> There is another bug 3591, at
> https://issues.apache.org/jira/browse/HDFS-3591
>
> which says it is for backporting 3357 to branch 0.23
>
> So, I don't understand whether the fix is really in 2.0.0-alpha, request
> you to please clarify me.
>
> Thanks,
> Kishore
>
>
>
>
>
> On Thu, Mar 21, 2013 at 9:57 AM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
>
>> There was an issue related to hung connections (HDFS-3357). But the JIRA
>> indicates the fix is available in Hadoop-2.0.0-alpha. Still, would be worth
>> checking on Sandy's suggestion
>>
>>
>> On Wed, Mar 20, 2013 at 11:09 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>
>>> Hi Kishore,
>>>
>>> 50010 is the datanode port. Does your lsof indicate that the sockets are
>>> in CLOSE_WAIT? I had come across an issue like this where that was a
>>> symptom.
>>>
>>> -Sandy
>>>
>>>
>>> On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
>>> write2kishore@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am running a date command with YARN's distributed shell example in a
>>>> loop of 1000 times in this way:
>>>>
>>>> yarn jar
>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>> --shell_command date --num_containers 2
>>>>
>>>>
>>>> Around 730th time or so, I am getting an error in node manager's log
>>>> saying that it failed to launch container because there are "Too many open
>>>> files" and when I observe through lsof command,I find that there is one
>>>> instance of this kind of file is left for each run of Application Master,
>>>> and it kept growing as I am running it in loop.
>>>>
>>>> node1:44871->node1:50010
>>>>
>>>> Is this a known issue? Or am I missing doing something? Please help.
>>>>
>>>> Note: I am working on hadoop--2.0.0-alpha
>>>>
>>>> Thanks,
>>>> Kishore
>>>>
>>>
>>>
>>
>
Re: Too many open files error with YARN
Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.
Hi Hemanth,
Thanks for the reply, I shall try to get that jstack and reply back, I am
also trying to download hadoop-2.0.3-alpha and see if I can overcome this
error.
Thanks,
Kishore
On Thu, Mar 21, 2013 at 3:24 PM, Hemanth Yamijala <yhemanth@thoughtworks.com
> wrote:
> There is a way to confirm if it is the same bug. Can you pick a jstack on
> the process that has established a connection to 50010 and post it here..
>
> Thanks
> hemanth
>
>
> On Thu, Mar 21, 2013 at 12:13 PM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>> Hi Hemanth & Sandy,
>>
>> Thanks for your reply. Yes, that indicates it is in close wait state,
>> exactly like below:
>>
>> java 30718 dsadm 200u IPv4 1178376459 0t0
>> TCP *:50010 (LISTEN)
>> java 31512 dsadm 240u IPv6 1178391921 0t0
>> TCP node1:51342->node1:50010 (CLOSE_WAIT)
>>
>> I just checked in at the link
>> https://issues.apache.org/jira/browse/HDFS-3357 it shows 2.0.0-alpha
>> both in affect versions and fix versions.
>>
>> There is another bug 3591, at
>> https://issues.apache.org/jira/browse/HDFS-3591
>>
>> which says it is for backporting 3357 to branch 0.23
>>
>> So, I don't understand whether the fix is really in 2.0.0-alpha, request
>> you to please clarify me.
>>
>> Thanks,
>> Kishore
>>
>>
>>
>>
>>
>> On Thu, Mar 21, 2013 at 9:57 AM, Hemanth Yamijala <
>> yhemanth@thoughtworks.com> wrote:
>>
>>> There was an issue related to hung connections (HDFS-3357). But the JIRA
>>> indicates the fix is available in Hadoop-2.0.0-alpha. Still, would be worth
>>> checking on Sandy's suggestion
>>>
>>>
>>> On Wed, Mar 20, 2013 at 11:09 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>
>>>> Hi Kishore,
>>>>
>>>> 50010 is the datanode port. Does your lsof indicate that the sockets
>>>> are in CLOSE_WAIT? I had come across an issue like this where that was a
>>>> symptom.
>>>>
>>>> -Sandy
>>>>
>>>>
>>>> On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
>>>> write2kishore@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am running a date command with YARN's distributed shell example in
>>>>> a loop of 1000 times in this way:
>>>>>
>>>>> yarn jar
>>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>>> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
>>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>>> --shell_command date --num_containers 2
>>>>>
>>>>>
>>>>> Around 730th time or so, I am getting an error in node manager's log
>>>>> saying that it failed to launch container because there are "Too many open
>>>>> files" and when I observe through lsof command,I find that there is one
>>>>> instance of this kind of file is left for each run of Application Master,
>>>>> and it kept growing as I am running it in loop.
>>>>>
>>>>> node1:44871->node1:50010
>>>>>
>>>>> Is this a known issue? Or am I missing doing something? Please help.
>>>>>
>>>>> Note: I am working on hadoop--2.0.0-alpha
>>>>>
>>>>> Thanks,
>>>>> Kishore
>>>>>
>>>>
>>>>
>>>
>>
>
Re: Too many open files error with YARN
Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.
Hi Hemanth,
Thanks for the reply, I shall try to get that jstack and reply back, I am
also trying to download hadoop-2.0.3-alpha and see if I can overcome this
error.
Thanks,
Kishore
On Thu, Mar 21, 2013 at 3:24 PM, Hemanth Yamijala <yhemanth@thoughtworks.com
> wrote:
> There is a way to confirm if it is the same bug. Can you pick a jstack on
> the process that has established a connection to 50010 and post it here..
>
> Thanks
> hemanth
>
>
> On Thu, Mar 21, 2013 at 12:13 PM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>> Hi Hemanth & Sandy,
>>
>> Thanks for your reply. Yes, that indicates it is in close wait state,
>> exactly like below:
>>
>> java 30718 dsadm 200u IPv4 1178376459 0t0
>> TCP *:50010 (LISTEN)
>> java 31512 dsadm 240u IPv6 1178391921 0t0
>> TCP node1:51342->node1:50010 (CLOSE_WAIT)
>>
>> I just checked in at the link
>> https://issues.apache.org/jira/browse/HDFS-3357 it shows 2.0.0-alpha
>> both in affect versions and fix versions.
>>
>> There is another bug 3591, at
>> https://issues.apache.org/jira/browse/HDFS-3591
>>
>> which says it is for backporting 3357 to branch 0.23
>>
>> So, I don't understand whether the fix is really in 2.0.0-alpha, request
>> you to please clarify me.
>>
>> Thanks,
>> Kishore
>>
>>
>>
>>
>>
>> On Thu, Mar 21, 2013 at 9:57 AM, Hemanth Yamijala <
>> yhemanth@thoughtworks.com> wrote:
>>
>>> There was an issue related to hung connections (HDFS-3357). But the JIRA
>>> indicates the fix is available in Hadoop-2.0.0-alpha. Still, would be worth
>>> checking on Sandy's suggestion
>>>
>>>
>>> On Wed, Mar 20, 2013 at 11:09 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>
>>>> Hi Kishore,
>>>>
>>>> 50010 is the datanode port. Does your lsof indicate that the sockets
>>>> are in CLOSE_WAIT? I had come across an issue like this where that was a
>>>> symptom.
>>>>
>>>> -Sandy
>>>>
>>>>
>>>> On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
>>>> write2kishore@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am running a date command with YARN's distributed shell example in
>>>>> a loop of 1000 times in this way:
>>>>>
>>>>> yarn jar
>>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>>> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
>>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>>> --shell_command date --num_containers 2
>>>>>
>>>>>
>>>>> Around 730th time or so, I am getting an error in node manager's log
>>>>> saying that it failed to launch container because there are "Too many open
>>>>> files" and when I observe through lsof command,I find that there is one
>>>>> instance of this kind of file is left for each run of Application Master,
>>>>> and it kept growing as I am running it in loop.
>>>>>
>>>>> node1:44871->node1:50010
>>>>>
>>>>> Is this a known issue? Or am I missing doing something? Please help.
>>>>>
>>>>> Note: I am working on hadoop--2.0.0-alpha
>>>>>
>>>>> Thanks,
>>>>> Kishore
>>>>>
>>>>
>>>>
>>>
>>
>
Re: Too many open files error with YARN
Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.
Hi Hemanth,
Thanks for the reply, I shall try to get that jstack and reply back, I am
also trying to download hadoop-2.0.3-alpha and see if I can overcome this
error.
Thanks,
Kishore
On Thu, Mar 21, 2013 at 3:24 PM, Hemanth Yamijala <yhemanth@thoughtworks.com
> wrote:
> There is a way to confirm if it is the same bug. Can you pick a jstack on
> the process that has established a connection to 50010 and post it here..
>
> Thanks
> hemanth
>
>
> On Thu, Mar 21, 2013 at 12:13 PM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>> Hi Hemanth & Sandy,
>>
>> Thanks for your reply. Yes, that indicates it is in close wait state,
>> exactly like below:
>>
>> java 30718 dsadm 200u IPv4 1178376459 0t0
>> TCP *:50010 (LISTEN)
>> java 31512 dsadm 240u IPv6 1178391921 0t0
>> TCP node1:51342->node1:50010 (CLOSE_WAIT)
>>
>> I just checked in at the link
>> https://issues.apache.org/jira/browse/HDFS-3357 it shows 2.0.0-alpha
>> both in affect versions and fix versions.
>>
>> There is another bug 3591, at
>> https://issues.apache.org/jira/browse/HDFS-3591
>>
>> which says it is for backporting 3357 to branch 0.23
>>
>> So, I don't understand whether the fix is really in 2.0.0-alpha, request
>> you to please clarify me.
>>
>> Thanks,
>> Kishore
>>
>>
>>
>>
>>
>> On Thu, Mar 21, 2013 at 9:57 AM, Hemanth Yamijala <
>> yhemanth@thoughtworks.com> wrote:
>>
>>> There was an issue related to hung connections (HDFS-3357). But the JIRA
>>> indicates the fix is available in Hadoop-2.0.0-alpha. Still, would be worth
>>> checking on Sandy's suggestion
>>>
>>>
>>> On Wed, Mar 20, 2013 at 11:09 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>
>>>> Hi Kishore,
>>>>
>>>> 50010 is the datanode port. Does your lsof indicate that the sockets
>>>> are in CLOSE_WAIT? I had come across an issue like this where that was a
>>>> symptom.
>>>>
>>>> -Sandy
>>>>
>>>>
>>>> On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
>>>> write2kishore@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am running a date command with YARN's distributed shell example in
>>>>> a loop of 1000 times in this way:
>>>>>
>>>>> yarn jar
>>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>>> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
>>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>>> --shell_command date --num_containers 2
>>>>>
>>>>>
>>>>> Around 730th time or so, I am getting an error in node manager's log
>>>>> saying that it failed to launch container because there are "Too many open
>>>>> files" and when I observe through lsof command,I find that there is one
>>>>> instance of this kind of file is left for each run of Application Master,
>>>>> and it kept growing as I am running it in loop.
>>>>>
>>>>> node1:44871->node1:50010
>>>>>
>>>>> Is this a known issue? Or am I missing doing something? Please help.
>>>>>
>>>>> Note: I am working on hadoop--2.0.0-alpha
>>>>>
>>>>> Thanks,
>>>>> Kishore
>>>>>
>>>>
>>>>
>>>
>>
>
Re: Too many open files error with YARN
Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.
Hi Hemanth,
Thanks for the reply, I shall try to get that jstack and reply back, I am
also trying to download hadoop-2.0.3-alpha and see if I can overcome this
error.
Thanks,
Kishore
On Thu, Mar 21, 2013 at 3:24 PM, Hemanth Yamijala <yhemanth@thoughtworks.com
> wrote:
> There is a way to confirm if it is the same bug. Can you pick a jstack on
> the process that has established a connection to 50010 and post it here..
>
> Thanks
> hemanth
>
>
> On Thu, Mar 21, 2013 at 12:13 PM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>> Hi Hemanth & Sandy,
>>
>> Thanks for your reply. Yes, that indicates it is in close wait state,
>> exactly like below:
>>
>> java 30718 dsadm 200u IPv4 1178376459 0t0
>> TCP *:50010 (LISTEN)
>> java 31512 dsadm 240u IPv6 1178391921 0t0
>> TCP node1:51342->node1:50010 (CLOSE_WAIT)
>>
>> I just checked in at the link
>> https://issues.apache.org/jira/browse/HDFS-3357 it shows 2.0.0-alpha
>> both in affect versions and fix versions.
>>
>> There is another bug 3591, at
>> https://issues.apache.org/jira/browse/HDFS-3591
>>
>> which says it is for backporting 3357 to branch 0.23
>>
>> So, I don't understand whether the fix is really in 2.0.0-alpha, request
>> you to please clarify me.
>>
>> Thanks,
>> Kishore
>>
>>
>>
>>
>>
>> On Thu, Mar 21, 2013 at 9:57 AM, Hemanth Yamijala <
>> yhemanth@thoughtworks.com> wrote:
>>
>>> There was an issue related to hung connections (HDFS-3357). But the JIRA
>>> indicates the fix is available in Hadoop-2.0.0-alpha. Still, would be worth
>>> checking on Sandy's suggestion
>>>
>>>
>>> On Wed, Mar 20, 2013 at 11:09 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>
>>>> Hi Kishore,
>>>>
>>>> 50010 is the datanode port. Does your lsof indicate that the sockets
>>>> are in CLOSE_WAIT? I had come across an issue like this where that was a
>>>> symptom.
>>>>
>>>> -Sandy
>>>>
>>>>
>>>> On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
>>>> write2kishore@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am running a date command with YARN's distributed shell example in
>>>>> a loop of 1000 times in this way:
>>>>>
>>>>> yarn jar
>>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>>> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
>>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>>> --shell_command date --num_containers 2
>>>>>
>>>>>
>>>>> Around 730th time or so, I am getting an error in node manager's log
>>>>> saying that it failed to launch container because there are "Too many open
>>>>> files" and when I observe through lsof command,I find that there is one
>>>>> instance of this kind of file is left for each run of Application Master,
>>>>> and it kept growing as I am running it in loop.
>>>>>
>>>>> node1:44871->node1:50010
>>>>>
>>>>> Is this a known issue? Or am I missing doing something? Please help.
>>>>>
>>>>> Note: I am working on hadoop--2.0.0-alpha
>>>>>
>>>>> Thanks,
>>>>> Kishore
>>>>>
>>>>
>>>>
>>>
>>
>
Re: Too many open files error with YARN
Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
There is a way to confirm if it is the same bug. Can you pick a jstack on
the process that has established a connection to 50010 and post it here..
Thanks
hemanth
On Thu, Mar 21, 2013 at 12:13 PM, Krishna Kishore Bonagiri <
write2kishore@gmail.com> wrote:
> Hi Hemanth & Sandy,
>
> Thanks for your reply. Yes, that indicates it is in close wait state,
> exactly like below:
>
> java 30718 dsadm 200u IPv4 1178376459 0t0
> TCP *:50010 (LISTEN)
> java 31512 dsadm 240u IPv6 1178391921 0t0
> TCP node1:51342->node1:50010 (CLOSE_WAIT)
>
> I just checked in at the link
> https://issues.apache.org/jira/browse/HDFS-3357 it shows 2.0.0-alpha both
> in affect versions and fix versions.
>
> There is another bug 3591, at
> https://issues.apache.org/jira/browse/HDFS-3591
>
> which says it is for backporting 3357 to branch 0.23
>
> So, I don't understand whether the fix is really in 2.0.0-alpha, request
> you to please clarify me.
>
> Thanks,
> Kishore
>
>
>
>
>
> On Thu, Mar 21, 2013 at 9:57 AM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
>
>> There was an issue related to hung connections (HDFS-3357). But the JIRA
>> indicates the fix is available in Hadoop-2.0.0-alpha. Still, would be worth
>> checking on Sandy's suggestion
>>
>>
>> On Wed, Mar 20, 2013 at 11:09 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>
>>> Hi Kishore,
>>>
>>> 50010 is the datanode port. Does your lsof indicate that the sockets are
>>> in CLOSE_WAIT? I had come across an issue like this where that was a
>>> symptom.
>>>
>>> -Sandy
>>>
>>>
>>> On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
>>> write2kishore@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am running a date command with YARN's distributed shell example in a
>>>> loop of 1000 times in this way:
>>>>
>>>> yarn jar
>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>> --shell_command date --num_containers 2
>>>>
>>>>
>>>> Around 730th time or so, I am getting an error in node manager's log
>>>> saying that it failed to launch container because there are "Too many open
>>>> files" and when I observe through lsof command,I find that there is one
>>>> instance of this kind of file is left for each run of Application Master,
>>>> and it kept growing as I am running it in loop.
>>>>
>>>> node1:44871->node1:50010
>>>>
>>>> Is this a known issue? Or am I missing doing something? Please help.
>>>>
>>>> Note: I am working on hadoop--2.0.0-alpha
>>>>
>>>> Thanks,
>>>> Kishore
>>>>
>>>
>>>
>>
>
Re: Too many open files error with YARN
Posted by Manoj Babu <ma...@gmail.com>.
In the mean time you can quickly compare the source of the class
with provided patch in the bug.
Cheers!
Manoj.
On Thu, Mar 21, 2013 at 12:13 PM, Krishna Kishore Bonagiri <
write2kishore@gmail.com> wrote:
> Hi Hemanth & Sandy,
>
> Thanks for your reply. Yes, that indicates it is in close wait state,
> exactly like below:
>
> java 30718 dsadm 200u IPv4 1178376459 0t0
> TCP *:50010 (LISTEN)
> java 31512 dsadm 240u IPv6 1178391921 0t0
> TCP node1:51342->node1:50010 (CLOSE_WAIT)
>
> I just checked in at the link
> https://issues.apache.org/jira/browse/HDFS-3357 it shows 2.0.0-alpha both
> in affect versions and fix versions.
>
> There is another bug 3591, at
> https://issues.apache.org/jira/browse/HDFS-3591
>
> which says it is for backporting 3357 to branch 0.23
>
> So, I don't understand whether the fix is really in 2.0.0-alpha, request
> you to please clarify me.
>
> Thanks,
> Kishore
>
>
>
>
>
> On Thu, Mar 21, 2013 at 9:57 AM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
>
>> There was an issue related to hung connections (HDFS-3357). But the JIRA
>> indicates the fix is available in Hadoop-2.0.0-alpha. Still, would be worth
>> checking on Sandy's suggestion
>>
>>
>> On Wed, Mar 20, 2013 at 11:09 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>
>>> Hi Kishore,
>>>
>>> 50010 is the datanode port. Does your lsof indicate that the sockets are
>>> in CLOSE_WAIT? I had come across an issue like this where that was a
>>> symptom.
>>>
>>> -Sandy
>>>
>>>
>>> On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
>>> write2kishore@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am running a date command with YARN's distributed shell example in a
>>>> loop of 1000 times in this way:
>>>>
>>>> yarn jar
>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>> --shell_command date --num_containers 2
>>>>
>>>>
>>>> Around 730th time or so, I am getting an error in node manager's log
>>>> saying that it failed to launch container because there are "Too many open
>>>> files" and when I observe through lsof command,I find that there is one
>>>> instance of this kind of file is left for each run of Application Master,
>>>> and it kept growing as I am running it in loop.
>>>>
>>>> node1:44871->node1:50010
>>>>
>>>> Is this a known issue? Or am I missing doing something? Please help.
>>>>
>>>> Note: I am working on hadoop--2.0.0-alpha
>>>>
>>>> Thanks,
>>>> Kishore
>>>>
>>>
>>>
>>
>
Re: Too many open files error with YARN
Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
There is a way to confirm if it is the same bug. Can you pick a jstack on
the process that has established a connection to 50010 and post it here..
Thanks
hemanth
On Thu, Mar 21, 2013 at 12:13 PM, Krishna Kishore Bonagiri <
write2kishore@gmail.com> wrote:
> Hi Hemanth & Sandy,
>
> Thanks for your reply. Yes, that indicates it is in close wait state,
> exactly like below:
>
> java 30718 dsadm 200u IPv4 1178376459 0t0
> TCP *:50010 (LISTEN)
> java 31512 dsadm 240u IPv6 1178391921 0t0
> TCP node1:51342->node1:50010 (CLOSE_WAIT)
>
> I just checked in at the link
> https://issues.apache.org/jira/browse/HDFS-3357 it shows 2.0.0-alpha both
> in affect versions and fix versions.
>
> There is another bug 3591, at
> https://issues.apache.org/jira/browse/HDFS-3591
>
> which says it is for backporting 3357 to branch 0.23
>
> So, I don't understand whether the fix is really in 2.0.0-alpha, request
> you to please clarify me.
>
> Thanks,
> Kishore
>
>
>
>
>
> On Thu, Mar 21, 2013 at 9:57 AM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
>
>> There was an issue related to hung connections (HDFS-3357). But the JIRA
>> indicates the fix is available in Hadoop-2.0.0-alpha. Still, would be worth
>> checking on Sandy's suggestion
>>
>>
>> On Wed, Mar 20, 2013 at 11:09 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>
>>> Hi Kishore,
>>>
>>> 50010 is the datanode port. Does your lsof indicate that the sockets are
>>> in CLOSE_WAIT? I had come across an issue like this where that was a
>>> symptom.
>>>
>>> -Sandy
>>>
>>>
>>> On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
>>> write2kishore@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am running a date command with YARN's distributed shell example in a
>>>> loop of 1000 times in this way:
>>>>
>>>> yarn jar
>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>> --shell_command date --num_containers 2
>>>>
>>>>
>>>> Around 730th time or so, I am getting an error in node manager's log
>>>> saying that it failed to launch container because there are "Too many open
>>>> files" and when I observe through lsof command,I find that there is one
>>>> instance of this kind of file is left for each run of Application Master,
>>>> and it kept growing as I am running it in loop.
>>>>
>>>> node1:44871->node1:50010
>>>>
>>>> Is this a known issue? Or am I missing doing something? Please help.
>>>>
>>>> Note: I am working on hadoop--2.0.0-alpha
>>>>
>>>> Thanks,
>>>> Kishore
>>>>
>>>
>>>
>>
>
Re: Too many open files error with YARN
Posted by Manoj Babu <ma...@gmail.com>.
In the mean time you can quickly compare the source of the class
with provided patch in the bug.
Cheers!
Manoj.
On Thu, Mar 21, 2013 at 12:13 PM, Krishna Kishore Bonagiri <
write2kishore@gmail.com> wrote:
> Hi Hemanth & Sandy,
>
> Thanks for your reply. Yes, that indicates it is in close wait state,
> exactly like below:
>
> java 30718 dsadm 200u IPv4 1178376459 0t0
> TCP *:50010 (LISTEN)
> java 31512 dsadm 240u IPv6 1178391921 0t0
> TCP node1:51342->node1:50010 (CLOSE_WAIT)
>
> I just checked in at the link
> https://issues.apache.org/jira/browse/HDFS-3357 it shows 2.0.0-alpha both
> in affect versions and fix versions.
>
> There is another bug 3591, at
> https://issues.apache.org/jira/browse/HDFS-3591
>
> which says it is for backporting 3357 to branch 0.23
>
> So, I don't understand whether the fix is really in 2.0.0-alpha, request
> you to please clarify me.
>
> Thanks,
> Kishore
>
>
>
>
>
> On Thu, Mar 21, 2013 at 9:57 AM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
>
>> There was an issue related to hung connections (HDFS-3357). But the JIRA
>> indicates the fix is available in Hadoop-2.0.0-alpha. Still, would be worth
>> checking on Sandy's suggestion
>>
>>
>> On Wed, Mar 20, 2013 at 11:09 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>
>>> Hi Kishore,
>>>
>>> 50010 is the datanode port. Does your lsof indicate that the sockets are
>>> in CLOSE_WAIT? I had come across an issue like this where that was a
>>> symptom.
>>>
>>> -Sandy
>>>
>>>
>>> On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
>>> write2kishore@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am running a date command with YARN's distributed shell example in a
>>>> loop of 1000 times in this way:
>>>>
>>>> yarn jar
>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>> --shell_command date --num_containers 2
>>>>
>>>>
>>>> Around 730th time or so, I am getting an error in node manager's log
>>>> saying that it failed to launch container because there are "Too many open
>>>> files" and when I observe through lsof command,I find that there is one
>>>> instance of this kind of file is left for each run of Application Master,
>>>> and it kept growing as I am running it in loop.
>>>>
>>>> node1:44871->node1:50010
>>>>
>>>> Is this a known issue? Or am I missing doing something? Please help.
>>>>
>>>> Note: I am working on hadoop--2.0.0-alpha
>>>>
>>>> Thanks,
>>>> Kishore
>>>>
>>>
>>>
>>
>
Re: Too many open files error with YARN
Posted by Manoj Babu <ma...@gmail.com>.
In the mean time you can quickly compare the source of the class
with provided patch in the bug.
Cheers!
Manoj.
On Thu, Mar 21, 2013 at 12:13 PM, Krishna Kishore Bonagiri <
write2kishore@gmail.com> wrote:
> Hi Hemanth & Sandy,
>
> Thanks for your reply. Yes, that indicates it is in close wait state,
> exactly like below:
>
> java 30718 dsadm 200u IPv4 1178376459 0t0
> TCP *:50010 (LISTEN)
> java 31512 dsadm 240u IPv6 1178391921 0t0
> TCP node1:51342->node1:50010 (CLOSE_WAIT)
>
> I just checked in at the link
> https://issues.apache.org/jira/browse/HDFS-3357 it shows 2.0.0-alpha both
> in affect versions and fix versions.
>
> There is another bug 3591, at
> https://issues.apache.org/jira/browse/HDFS-3591
>
> which says it is for backporting 3357 to branch 0.23
>
> So, I don't understand whether the fix is really in 2.0.0-alpha, request
> you to please clarify me.
>
> Thanks,
> Kishore
>
>
>
>
>
> On Thu, Mar 21, 2013 at 9:57 AM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
>
>> There was an issue related to hung connections (HDFS-3357). But the JIRA
>> indicates the fix is available in Hadoop-2.0.0-alpha. Still, would be worth
>> checking on Sandy's suggestion
>>
>>
>> On Wed, Mar 20, 2013 at 11:09 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>
>>> Hi Kishore,
>>>
>>> 50010 is the datanode port. Does your lsof indicate that the sockets are
>>> in CLOSE_WAIT? I had come across an issue like this where that was a
>>> symptom.
>>>
>>> -Sandy
>>>
>>>
>>> On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
>>> write2kishore@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am running a date command with YARN's distributed shell example in a
>>>> loop of 1000 times in this way:
>>>>
>>>> yarn jar
>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>> --shell_command date --num_containers 2
>>>>
>>>>
>>>> Around 730th time or so, I am getting an error in node manager's log
>>>> saying that it failed to launch container because there are "Too many open
>>>> files" and when I observe through lsof command,I find that there is one
>>>> instance of this kind of file is left for each run of Application Master,
>>>> and it kept growing as I am running it in loop.
>>>>
>>>> node1:44871->node1:50010
>>>>
>>>> Is this a known issue? Or am I missing doing something? Please help.
>>>>
>>>> Note: I am working on hadoop--2.0.0-alpha
>>>>
>>>> Thanks,
>>>> Kishore
>>>>
>>>
>>>
>>
>
Re: Too many open files error with YARN
Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
There is a way to confirm if it is the same bug. Can you pick a jstack on
the process that has established a connection to 50010 and post it here..
Thanks
hemanth
On Thu, Mar 21, 2013 at 12:13 PM, Krishna Kishore Bonagiri <
write2kishore@gmail.com> wrote:
> Hi Hemanth & Sandy,
>
> Thanks for your reply. Yes, that indicates it is in close wait state,
> exactly like below:
>
> java 30718 dsadm 200u IPv4 1178376459 0t0
> TCP *:50010 (LISTEN)
> java 31512 dsadm 240u IPv6 1178391921 0t0
> TCP node1:51342->node1:50010 (CLOSE_WAIT)
>
> I just checked in at the link
> https://issues.apache.org/jira/browse/HDFS-3357 it shows 2.0.0-alpha both
> in affect versions and fix versions.
>
> There is another bug 3591, at
> https://issues.apache.org/jira/browse/HDFS-3591
>
> which says it is for backporting 3357 to branch 0.23
>
> So, I don't understand whether the fix is really in 2.0.0-alpha, request
> you to please clarify me.
>
> Thanks,
> Kishore
>
>
>
>
>
> On Thu, Mar 21, 2013 at 9:57 AM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
>
>> There was an issue related to hung connections (HDFS-3357). But the JIRA
>> indicates the fix is available in Hadoop-2.0.0-alpha. Still, would be worth
>> checking on Sandy's suggestion
>>
>>
>> On Wed, Mar 20, 2013 at 11:09 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>
>>> Hi Kishore,
>>>
>>> 50010 is the datanode port. Does your lsof indicate that the sockets are
>>> in CLOSE_WAIT? I had come across an issue like this where that was a
>>> symptom.
>>>
>>> -Sandy
>>>
>>>
>>> On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
>>> write2kishore@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am running a date command with YARN's distributed shell example in a
>>>> loop of 1000 times in this way:
>>>>
>>>> yarn jar
>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>> --shell_command date --num_containers 2
>>>>
>>>>
>>>> Around 730th time or so, I am getting an error in node manager's log
>>>> saying that it failed to launch container because there are "Too many open
>>>> files" and when I observe through lsof command,I find that there is one
>>>> instance of this kind of file is left for each run of Application Master,
>>>> and it kept growing as I am running it in loop.
>>>>
>>>> node1:44871->node1:50010
>>>>
>>>> Is this a known issue? Or am I missing doing something? Please help.
>>>>
>>>> Note: I am working on hadoop--2.0.0-alpha
>>>>
>>>> Thanks,
>>>> Kishore
>>>>
>>>
>>>
>>
>
Re: Too many open files error with YARN
Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
There is a way to confirm if it is the same bug. Can you pick a jstack on
the process that has established a connection to 50010 and post it here..
Thanks
hemanth
On Thu, Mar 21, 2013 at 12:13 PM, Krishna Kishore Bonagiri <
write2kishore@gmail.com> wrote:
> Hi Hemanth & Sandy,
>
> Thanks for your reply. Yes, that indicates it is in close wait state,
> exactly like below:
>
> java 30718 dsadm 200u IPv4 1178376459 0t0
> TCP *:50010 (LISTEN)
> java 31512 dsadm 240u IPv6 1178391921 0t0
> TCP node1:51342->node1:50010 (CLOSE_WAIT)
>
> I just checked in at the link
> https://issues.apache.org/jira/browse/HDFS-3357 it shows 2.0.0-alpha both
> in affect versions and fix versions.
>
> There is another bug 3591, at
> https://issues.apache.org/jira/browse/HDFS-3591
>
> which says it is for backporting 3357 to branch 0.23
>
> So, I don't understand whether the fix is really in 2.0.0-alpha, request
> you to please clarify me.
>
> Thanks,
> Kishore
>
>
>
>
>
> On Thu, Mar 21, 2013 at 9:57 AM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
>
>> There was an issue related to hung connections (HDFS-3357). But the JIRA
>> indicates the fix is available in Hadoop-2.0.0-alpha. Still, would be worth
>> checking on Sandy's suggestion
>>
>>
>> On Wed, Mar 20, 2013 at 11:09 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>
>>> Hi Kishore,
>>>
>>> 50010 is the datanode port. Does your lsof indicate that the sockets are
>>> in CLOSE_WAIT? I had come across an issue like this where that was a
>>> symptom.
>>>
>>> -Sandy
>>>
>>>
>>> On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
>>> write2kishore@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am running a date command with YARN's distributed shell example in a
>>>> loop of 1000 times in this way:
>>>>
>>>> yarn jar
>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>> --shell_command date --num_containers 2
>>>>
>>>>
>>>> Around 730th time or so, I am getting an error in node manager's log
>>>> saying that it failed to launch container because there are "Too many open
>>>> files" and when I observe through lsof command,I find that there is one
>>>> instance of this kind of file is left for each run of Application Master,
>>>> and it kept growing as I am running it in loop.
>>>>
>>>> node1:44871->node1:50010
>>>>
>>>> Is this a known issue? Or am I missing doing something? Please help.
>>>>
>>>> Note: I am working on hadoop--2.0.0-alpha
>>>>
>>>> Thanks,
>>>> Kishore
>>>>
>>>
>>>
>>
>
Re: Too many open files error with YARN
Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.
Hi Hemanth & Sandy,
Thanks for your reply. Yes, that indicates it is in close wait state,
exactly like below:
java 30718 dsadm 200u IPv4 1178376459 0t0
TCP *:50010 (LISTEN)
java 31512 dsadm 240u IPv6 1178391921 0t0
TCP node1:51342->node1:50010 (CLOSE_WAIT)
I just checked in at the link
https://issues.apache.org/jira/browse/HDFS-3357 it shows 2.0.0-alpha both
in affect versions and fix versions.
There is another bug 3591, at
https://issues.apache.org/jira/browse/HDFS-3591
which says it is for backporting 3357 to branch 0.23
So, I don't understand whether the fix is really in 2.0.0-alpha, request
you to please clarify me.
Thanks,
Kishore
On Thu, Mar 21, 2013 at 9:57 AM, Hemanth Yamijala <yhemanth@thoughtworks.com
> wrote:
> There was an issue related to hung connections (HDFS-3357). But the JIRA
> indicates the fix is available in Hadoop-2.0.0-alpha. Still, would be worth
> checking on Sandy's suggestion
>
>
> On Wed, Mar 20, 2013 at 11:09 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>
>> Hi Kishore,
>>
>> 50010 is the datanode port. Does your lsof indicate that the sockets are
>> in CLOSE_WAIT? I had come across an issue like this where that was a
>> symptom.
>>
>> -Sandy
>>
>>
>> On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
>> write2kishore@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I am running a date command with YARN's distributed shell example in a
>>> loop of 1000 times in this way:
>>>
>>> yarn jar
>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>> --shell_command date --num_containers 2
>>>
>>>
>>> Around 730th time or so, I am getting an error in node manager's log
>>> saying that it failed to launch container because there are "Too many open
>>> files" and when I observe through lsof command,I find that there is one
>>> instance of this kind of file is left for each run of Application Master,
>>> and it kept growing as I am running it in loop.
>>>
>>> node1:44871->node1:50010
>>>
>>> Is this a known issue? Or am I missing doing something? Please help.
>>>
>>> Note: I am working on hadoop--2.0.0-alpha
>>>
>>> Thanks,
>>> Kishore
>>>
>>
>>
>
Re: Too many open files error with YARN
Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.
Hi Hemanth & Sandy,
Thanks for your reply. Yes, that indicates it is in close wait state,
exactly like below:
java 30718 dsadm 200u IPv4 1178376459 0t0
TCP *:50010 (LISTEN)
java 31512 dsadm 240u IPv6 1178391921 0t0
TCP node1:51342->node1:50010 (CLOSE_WAIT)
I just checked in at the link
https://issues.apache.org/jira/browse/HDFS-3357 it shows 2.0.0-alpha both
in affect versions and fix versions.
There is another bug 3591, at
https://issues.apache.org/jira/browse/HDFS-3591
which says it is for backporting 3357 to branch 0.23
So, I don't understand whether the fix is really in 2.0.0-alpha, request
you to please clarify me.
Thanks,
Kishore
On Thu, Mar 21, 2013 at 9:57 AM, Hemanth Yamijala <yhemanth@thoughtworks.com
> wrote:
> There was an issue related to hung connections (HDFS-3357). But the JIRA
> indicates the fix is available in Hadoop-2.0.0-alpha. Still, would be worth
> checking on Sandy's suggestion
>
>
> On Wed, Mar 20, 2013 at 11:09 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>
>> Hi Kishore,
>>
>> 50010 is the datanode port. Does your lsof indicate that the sockets are
>> in CLOSE_WAIT? I had come across an issue like this where that was a
>> symptom.
>>
>> -Sandy
>>
>>
>> On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
>> write2kishore@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I am running a date command with YARN's distributed shell example in a
>>> loop of 1000 times in this way:
>>>
>>> yarn jar
>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>> --shell_command date --num_containers 2
>>>
>>>
>>> Around 730th time or so, I am getting an error in node manager's log
>>> saying that it failed to launch container because there are "Too many open
>>> files" and when I observe through lsof command,I find that there is one
>>> instance of this kind of file is left for each run of Application Master,
>>> and it kept growing as I am running it in loop.
>>>
>>> node1:44871->node1:50010
>>>
>>> Is this a known issue? Or am I missing doing something? Please help.
>>>
>>> Note: I am working on hadoop--2.0.0-alpha
>>>
>>> Thanks,
>>> Kishore
>>>
>>
>>
>
Re: Too many open files error with YARN
Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.
Hi Hemanth & Sandy,
Thanks for your reply. Yes, that indicates it is in close wait state,
exactly like below:
java 30718 dsadm 200u IPv4 1178376459 0t0
TCP *:50010 (LISTEN)
java 31512 dsadm 240u IPv6 1178391921 0t0
TCP node1:51342->node1:50010 (CLOSE_WAIT)
I just checked in at the link
https://issues.apache.org/jira/browse/HDFS-3357 it shows 2.0.0-alpha both
in affect versions and fix versions.
There is another bug 3591, at
https://issues.apache.org/jira/browse/HDFS-3591
which says it is for backporting 3357 to branch 0.23
So, I don't understand whether the fix is really in 2.0.0-alpha, request
you to please clarify me.
Thanks,
Kishore
On Thu, Mar 21, 2013 at 9:57 AM, Hemanth Yamijala <yhemanth@thoughtworks.com
> wrote:
> There was an issue related to hung connections (HDFS-3357). But the JIRA
> indicates the fix is available in Hadoop-2.0.0-alpha. Still, would be worth
> checking on Sandy's suggestion
>
>
> On Wed, Mar 20, 2013 at 11:09 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>
>> Hi Kishore,
>>
>> 50010 is the datanode port. Does your lsof indicate that the sockets are
>> in CLOSE_WAIT? I had come across an issue like this where that was a
>> symptom.
>>
>> -Sandy
>>
>>
>> On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
>> write2kishore@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I am running a date command with YARN's distributed shell example in a
>>> loop of 1000 times in this way:
>>>
>>> yarn jar
>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>> --shell_command date --num_containers 2
>>>
>>>
>>> Around 730th time or so, I am getting an error in node manager's log
>>> saying that it failed to launch container because there are "Too many open
>>> files" and when I observe through lsof command,I find that there is one
>>> instance of this kind of file is left for each run of Application Master,
>>> and it kept growing as I am running it in loop.
>>>
>>> node1:44871->node1:50010
>>>
>>> Is this a known issue? Or am I missing doing something? Please help.
>>>
>>> Note: I am working on hadoop--2.0.0-alpha
>>>
>>> Thanks,
>>> Kishore
>>>
>>
>>
>
Re: Too many open files error with YARN
Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.
Hi Hemanth & Sandy,
Thanks for your reply. Yes, that indicates it is in close wait state,
exactly like below:
java 30718 dsadm 200u IPv4 1178376459 0t0
TCP *:50010 (LISTEN)
java 31512 dsadm 240u IPv6 1178391921 0t0
TCP node1:51342->node1:50010 (CLOSE_WAIT)
I just checked in at the link
https://issues.apache.org/jira/browse/HDFS-3357 it shows 2.0.0-alpha both
in affect versions and fix versions.
There is another bug 3591, at
https://issues.apache.org/jira/browse/HDFS-3591
which says it is for backporting 3357 to branch 0.23
So, I don't understand whether the fix is really in 2.0.0-alpha, request
you to please clarify me.
Thanks,
Kishore
On Thu, Mar 21, 2013 at 9:57 AM, Hemanth Yamijala <yhemanth@thoughtworks.com
> wrote:
> There was an issue related to hung connections (HDFS-3357). But the JIRA
> indicates the fix is available in Hadoop-2.0.0-alpha. Still, would be worth
> checking on Sandy's suggestion
>
>
> On Wed, Mar 20, 2013 at 11:09 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>
>> Hi Kishore,
>>
>> 50010 is the datanode port. Does your lsof indicate that the sockets are
>> in CLOSE_WAIT? I had come across an issue like this where that was a
>> symptom.
>>
>> -Sandy
>>
>>
>> On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
>> write2kishore@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I am running a date command with YARN's distributed shell example in a
>>> loop of 1000 times in this way:
>>>
>>> yarn jar
>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>> --shell_command date --num_containers 2
>>>
>>>
>>> Around 730th time or so, I am getting an error in node manager's log
>>> saying that it failed to launch container because there are "Too many open
>>> files" and when I observe through lsof command,I find that there is one
>>> instance of this kind of file is left for each run of Application Master,
>>> and it kept growing as I am running it in loop.
>>>
>>> node1:44871->node1:50010
>>>
>>> Is this a known issue? Or am I missing doing something? Please help.
>>>
>>> Note: I am working on hadoop--2.0.0-alpha
>>>
>>> Thanks,
>>> Kishore
>>>
>>
>>
>
Re: Too many open files error with YARN
Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
There was an issue related to hung connections (HDFS-3357). But the JIRA
indicates the fix is available in Hadoop-2.0.0-alpha. Still, would be worth
checking on Sandy's suggestion
On Wed, Mar 20, 2013 at 11:09 PM, Sandy Ryza <sa...@cloudera.com>wrote:
> Hi Kishore,
>
> 50010 is the datanode port. Does your lsof indicate that the sockets are
> in CLOSE_WAIT? I had come across an issue like this where that was a
> symptom.
>
> -Sandy
>
>
> On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>> Hi,
>>
>> I am running a date command with YARN's distributed shell example in a
>> loop of 1000 times in this way:
>>
>> yarn jar
>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>> --shell_command date --num_containers 2
>>
>>
>> Around 730th time or so, I am getting an error in node manager's log
>> saying that it failed to launch container because there are "Too many open
>> files" and when I observe through lsof command,I find that there is one
>> instance of this kind of file is left for each run of Application Master,
>> and it kept growing as I am running it in loop.
>>
>> node1:44871->node1:50010
>>
>> Is this a known issue? Or am I missing doing something? Please help.
>>
>> Note: I am working on hadoop--2.0.0-alpha
>>
>> Thanks,
>> Kishore
>>
>
>
Re: Too many open files error with YARN
Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
There was an issue related to hung connections (HDFS-3357). But the JIRA
indicates the fix is available in Hadoop-2.0.0-alpha. Still, would be worth
checking on Sandy's suggestion
On Wed, Mar 20, 2013 at 11:09 PM, Sandy Ryza <sa...@cloudera.com>wrote:
> Hi Kishore,
>
> 50010 is the datanode port. Does your lsof indicate that the sockets are
> in CLOSE_WAIT? I had come across an issue like this where that was a
> symptom.
>
> -Sandy
>
>
> On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>> Hi,
>>
>> I am running a date command with YARN's distributed shell example in a
>> loop of 1000 times in this way:
>>
>> yarn jar
>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>> --shell_command date --num_containers 2
>>
>>
>> Around 730th time or so, I am getting an error in node manager's log
>> saying that it failed to launch container because there are "Too many open
>> files" and when I observe through lsof command,I find that there is one
>> instance of this kind of file is left for each run of Application Master,
>> and it kept growing as I am running it in loop.
>>
>> node1:44871->node1:50010
>>
>> Is this a known issue? Or am I missing doing something? Please help.
>>
>> Note: I am working on hadoop--2.0.0-alpha
>>
>> Thanks,
>> Kishore
>>
>
>
Re: Too many open files error with YARN
Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
There was an issue related to hung connections (HDFS-3357). But the JIRA
indicates the fix is available in Hadoop-2.0.0-alpha. Still, would be worth
checking on Sandy's suggestion
On Wed, Mar 20, 2013 at 11:09 PM, Sandy Ryza <sa...@cloudera.com>wrote:
> Hi Kishore,
>
> 50010 is the datanode port. Does your lsof indicate that the sockets are
> in CLOSE_WAIT? I had come across an issue like this where that was a
> symptom.
>
> -Sandy
>
>
> On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>> Hi,
>>
>> I am running a date command with YARN's distributed shell example in a
>> loop of 1000 times in this way:
>>
>> yarn jar
>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>> --shell_command date --num_containers 2
>>
>>
>> Around 730th time or so, I am getting an error in node manager's log
>> saying that it failed to launch container because there are "Too many open
>> files" and when I observe through lsof command,I find that there is one
>> instance of this kind of file is left for each run of Application Master,
>> and it kept growing as I am running it in loop.
>>
>> node1:44871->node1:50010
>>
>> Is this a known issue? Or am I missing doing something? Please help.
>>
>> Note: I am working on hadoop--2.0.0-alpha
>>
>> Thanks,
>> Kishore
>>
>
>
Re: Too many open files error with YARN
Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
There was an issue related to hung connections (HDFS-3357). But the JIRA
indicates the fix is available in Hadoop-2.0.0-alpha. Still, would be worth
checking on Sandy's suggestion
On Wed, Mar 20, 2013 at 11:09 PM, Sandy Ryza <sa...@cloudera.com>wrote:
> Hi Kishore,
>
> 50010 is the datanode port. Does your lsof indicate that the sockets are
> in CLOSE_WAIT? I had come across an issue like this where that was a
> symptom.
>
> -Sandy
>
>
> On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>> Hi,
>>
>> I am running a date command with YARN's distributed shell example in a
>> loop of 1000 times in this way:
>>
>> yarn jar
>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>> --shell_command date --num_containers 2
>>
>>
>> Around 730th time or so, I am getting an error in node manager's log
>> saying that it failed to launch container because there are "Too many open
>> files" and when I observe through lsof command,I find that there is one
>> instance of this kind of file is left for each run of Application Master,
>> and it kept growing as I am running it in loop.
>>
>> node1:44871->node1:50010
>>
>> Is this a known issue? Or am I missing doing something? Please help.
>>
>> Note: I am working on hadoop--2.0.0-alpha
>>
>> Thanks,
>> Kishore
>>
>
>
Re: Too many open files error with YARN
Posted by Sandy Ryza <sa...@cloudera.com>.
Hi Kishore,
50010 is the datanode port. Does your lsof indicate that the sockets are in
CLOSE_WAIT? I had come across an issue like this where that was a symptom.
-Sandy
On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
write2kishore@gmail.com> wrote:
> Hi,
>
> I am running a date command with YARN's distributed shell example in a
> loop of 1000 times in this way:
>
> yarn jar
> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
> --shell_command date --num_containers 2
>
>
> Around 730th time or so, I am getting an error in node manager's log
> saying that it failed to launch container because there are "Too many open
> files" and when I observe through lsof command,I find that there is one
> instance of this kind of file is left for each run of Application Master,
> and it kept growing as I am running it in loop.
>
> node1:44871->node1:50010
>
> Is this a known issue? Or am I missing doing something? Please help.
>
> Note: I am working on hadoop--2.0.0-alpha
>
> Thanks,
> Kishore
>
Re: Too many open files error with YARN
Posted by Sandy Ryza <sa...@cloudera.com>.
Hi Kishore,
50010 is the datanode port. Does your lsof indicate that the sockets are in
CLOSE_WAIT? I had come across an issue like this where that was a symptom.
-Sandy
On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
write2kishore@gmail.com> wrote:
> Hi,
>
> I am running a date command with YARN's distributed shell example in a
> loop of 1000 times in this way:
>
> yarn jar
> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
> --shell_command date --num_containers 2
>
>
> Around 730th time or so, I am getting an error in node manager's log
> saying that it failed to launch container because there are "Too many open
> files" and when I observe through lsof command,I find that there is one
> instance of this kind of file is left for each run of Application Master,
> and it kept growing as I am running it in loop.
>
> node1:44871->node1:50010
>
> Is this a known issue? Or am I missing doing something? Please help.
>
> Note: I am working on hadoop--2.0.0-alpha
>
> Thanks,
> Kishore
>
Re: Too many open files error with YARN
Posted by Sandy Ryza <sa...@cloudera.com>.
Hi Kishore,
50010 is the datanode port. Does your lsof indicate that the sockets are in
CLOSE_WAIT? I had come across an issue like this where that was a symptom.
-Sandy
On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
write2kishore@gmail.com> wrote:
> Hi,
>
> I am running a date command with YARN's distributed shell example in a
> loop of 1000 times in this way:
>
> yarn jar
> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
> --shell_command date --num_containers 2
>
>
> Around 730th time or so, I am getting an error in node manager's log
> saying that it failed to launch container because there are "Too many open
> files" and when I observe through lsof command,I find that there is one
> instance of this kind of file is left for each run of Application Master,
> and it kept growing as I am running it in loop.
>
> node1:44871->node1:50010
>
> Is this a known issue? Or am I missing doing something? Please help.
>
> Note: I am working on hadoop--2.0.0-alpha
>
> Thanks,
> Kishore
>
Re: Too many open files error with YARN
Posted by Sandy Ryza <sa...@cloudera.com>.
Hi Kishore,
50010 is the datanode port. Does your lsof indicate that the sockets are in
CLOSE_WAIT? I had come across an issue like this where that was a symptom.
-Sandy
On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
write2kishore@gmail.com> wrote:
> Hi,
>
> I am running a date command with YARN's distributed shell example in a
> loop of 1000 times in this way:
>
> yarn jar
> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
> --shell_command date --num_containers 2
>
>
> Around 730th time or so, I am getting an error in node manager's log
> saying that it failed to launch container because there are "Too many open
> files" and when I observe through lsof command,I find that there is one
> instance of this kind of file is left for each run of Application Master,
> and it kept growing as I am running it in loop.
>
> node1:44871->node1:50010
>
> Is this a known issue? Or am I missing doing something? Please help.
>
> Note: I am working on hadoop--2.0.0-alpha
>
> Thanks,
> Kishore
>