You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Stas Oskin <st...@gmail.com> on 2009/06/21 12:06:48 UTC

"Too many open files" error, which gets resolved after some time

Hi.

I have HDFS client and HDFS datanode running on same machine.

When I'm trying to access a dozen of files at once from the client, several
times in a row, I'm starting to receive the following errors on client, and
HDFS browse function.

HDFS Client: "Could not get block locations. Aborting..."
HDFS browse: "Too many open files"

I can increase the maximum number of files that can opened, as I have it set
to the default 1024, but would like to first solve the problem, as larger
value just means it would run out of files again later on.

So my questions are:

1) Does the HDFS datanode keeps any files opened, even after the HDFS client
have already closed them?

2) Is it possible to find out, who keeps the opened files - datanode or
client (so I could pin-point the source of the problem).

Thanks in advance!

Re: "Too many open files" error, which gets resolved after some time

Posted by Steve Loughran <st...@apache.org>.

jason hadoop wrote:
> Yes.
> Otherwise the file descriptors will flow away like water.
> I also strongly suggest having at least 64k file descriptors as the open
> file limit.
> 
> On Sun, Jun 21, 2009 at 12:43 PM, Stas Oskin <st...@gmail.com> wrote:
> 
>> Hi.
>>
>> Thanks for the advice. So you advice explicitly closing each and every file
>> handle that I receive from HDFS?
>>
>> Regards.

I must disagree somewhat

If you use FileSystem.get() to get your client filesystem class, then 
that is shared by all threads/classes that use it. Call close() on that 
and any other thread or class holding a reference is in trouble. You 
have to wait for the finalizers for them to get cleaned up.

If you use FileSystem.newInstance() - which came in fairly recently 
(0.20? 0.21?) then you can call close() safely.

So: it depends on how you get your handle.

see: https://issues.apache.org/jira/browse/HADOOP-5933

Also: the too many open files problem can be caused in the NN  -you need 
to set up the Kernel to have lots more file handles around. Lots.

Re: "Too many open files" error, which gets resolved after some time

Posted by Stas Oskin <st...@gmail.com>.

Hi.

I've started doing just that, and indeed the amount of fd's of the DataNode
process have reduced significantly.

My problem is that my own app, which works with DFS, still have dozens of
pipes and epolls open.

The usual level seems to be about 300-400 fd's, but when I access the DFS
for accessing several files concurrently, this number easily climbs to
700-800. Moreover, the number sometimes seems to be stuck above 1000, and
only shutting down the app, at it's start, brings this number back to
300-400.

Any idea why this happens, and what else can be released to get it working?

Also, every file I open seems to bump the fd count sometimes as high as by
12. Any idea why single file requires so many fd's?

Thanks in advance.

2009/6/22 jason hadoop <ja...@gmail.com>

> Yes.
> Otherwise the file descriptors will flow away like water.
> I also strongly suggest having at least 64k file descriptors as the open
> file limit.
>
> On Sun, Jun 21, 2009 at 12:43 PM, Stas Oskin <st...@gmail.com> wrote:
>
> > Hi.
> >
> > Thanks for the advice. So you advice explicitly closing each and every
> file
> > handle that I receive from HDFS?
> >
> > Regards.
> >
> > 2009/6/21 jason hadoop <ja...@gmail.com>
> >
> > > Just to be clear, I second Brian's opinion. Relying on finalizes is a
> > very
> > > good way to run out of file descriptors.
> > >
> > > On Sun, Jun 21, 2009 at 9:32 AM, <Br...@nokia.com> wrote:
> > >
> > > > IMHO, you should never rely on finalizers to release scarce resources
> > > since
> > > > you don't know when the finalizer will get called, if ever.
> > > >
> > > > -brian
> > > >
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: ext jason hadoop [mailto:jason.hadoop@gmail.com]
> > > > Sent: Sunday, June 21, 2009 11:19 AM
> > > > To: core-user@hadoop.apache.org
> > > > Subject: Re: "Too many open files" error, which gets resolved after
> > some
> > > > time
> > > >
> > > > HDFS/DFS client uses quite a few file descriptors for each open file.
> > > >
> > > > Many application developers (but not the hadoop core) rely on the JVM
> > > > finalizer methods to close open files.
> > > >
> > > > This combination, expecially when many HDFS files are open can result
> > in
> > > > very large demands for file descriptors for Hadoop clients.
> > > > We as a general rule never run a cluster with nofile less that 64k,
> and
> > > for
> > > > larger clusters with demanding applications have had it set 10x
> higher.
> > I
> > > > also believe there was a set of JVM versions that leaked file
> > descriptors
> > > > used for NIO in the HDFS core. I do not recall the exact details.
> > > >
> > > > On Sun, Jun 21, 2009 at 5:27 AM, Stas Oskin <st...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi.
> > > > >
> > > > > After tracing some more with the lsof utility, and I managed to
> stop
> > > the
> > > > > growth on the DataNode process, but still have issues with my DFS
> > > client.
> > > > >
> > > > > It seems that my DFS client opens hundreds of pipes and eventpolls.
> > > Here
> > > > is
> > > > > a small part of the lsof output:
> > > > >
> > > > > java    10508 root  387w  FIFO                0,6           6142565
> > > pipe
> > > > > java    10508 root  388r  FIFO                0,6           6142565
> > > pipe
> > > > > java    10508 root  389u  0000               0,10        0  6142566
> > > > > eventpoll
> > > > > java    10508 root  390u  FIFO                0,6           6135311
> > > pipe
> > > > > java    10508 root  391r  FIFO                0,6           6135311
> > > pipe
> > > > > java    10508 root  392u  0000               0,10        0  6135312
> > > > > eventpoll
> > > > > java    10508 root  393r  FIFO                0,6           6148234
> > > pipe
> > > > > java    10508 root  394w  FIFO                0,6           6142570
> > > pipe
> > > > > java    10508 root  395r  FIFO                0,6           6135857
> > > pipe
> > > > > java    10508 root  396r  FIFO                0,6           6142570
> > > pipe
> > > > > java    10508 root  397r  0000               0,10        0  6142571
> > > > > eventpoll
> > > > > java    10508 root  398u  FIFO                0,6           6135319
> > > pipe
> > > > > java    10508 root  399w  FIFO                0,6           6135319
> > > pipe
> > > > >
> > > > > I'm using FSDataInputStream and FSDataOutputStream, so this might
> be
> > > > > related
> > > > > to pipes?
> > > > >
> > > > > So, my questions are:
> > > > >
> > > > > 1) What happens these pipes/epolls to appear?
> > > > >
> > > > > 2) More important, how I can prevent their accumation and growth?
> > > > >
> > > > > Thanks in advance!
> > > > >
> > > > > 2009/6/21 Stas Oskin <st...@gmail.com>
> > > > >
> > > > > > Hi.
> > > > > >
> > > > > > I have HDFS client and HDFS datanode running on same machine.
> > > > > >
> > > > > > When I'm trying to access a dozen of files at once from the
> client,
> > > > > several
> > > > > > times in a row, I'm starting to receive the following errors on
> > > client,
> > > > > and
> > > > > > HDFS browse function.
> > > > > >
> > > > > > HDFS Client: "Could not get block locations. Aborting..."
> > > > > > HDFS browse: "Too many open files"
> > > > > >
> > > > > > I can increase the maximum number of files that can opened, as I
> > have
> > > > it
> > > > > > set to the default 1024, but would like to first solve the
> problem,
> > > as
> > > > > > larger value just means it would run out of files again later on.
> > > > > >
> > > > > > So my questions are:
> > > > > >
> > > > > > 1) Does the HDFS datanode keeps any files opened, even after the
> > HDFS
> > > > > > client have already closed them?
> > > > > >
> > > > > > 2) Is it possible to find out, who keeps the opened files -
> > datanode
> > > or
> > > > > > client (so I could pin-point the source of the problem).
> > > > > >
> > > > > > Thanks in advance!
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Pro Hadoop, a book to guide you from beginner to hadoop mastery,
> > > > http://www.amazon.com/dp/1430219424?tag=jewlerymall
> > > > www.prohadoopbook.com a community for Hadoop Professionals
> > > >
> > >
> > >
> > >
> > > --
> > > Pro Hadoop, a book to guide you from beginner to hadoop mastery,
> > > http://www.amazon.com/dp/1430219424?tag=jewlerymall
> > > www.prohadoopbook.com a community for Hadoop Professionals
> > >
> >
>
>
>
> --
> Pro Hadoop, a book to guide you from beginner to hadoop mastery,
> http://www.amazon.com/dp/1430219424?tag=jewlerymall
> www.prohadoopbook.com a community for Hadoop Professionals
>

Re: "Too many open files" error, which gets resolved after some time

Posted by jason hadoop <ja...@gmail.com>.

Yes.
Otherwise the file descriptors will flow away like water.
I also strongly suggest having at least 64k file descriptors as the open
file limit.

On Sun, Jun 21, 2009 at 12:43 PM, Stas Oskin <st...@gmail.com> wrote:

> Hi.
>
> Thanks for the advice. So you advice explicitly closing each and every file
> handle that I receive from HDFS?
>
> Regards.
>
> 2009/6/21 jason hadoop <ja...@gmail.com>
>
> > Just to be clear, I second Brian's opinion. Relying on finalizes is a
> very
> > good way to run out of file descriptors.
> >
> > On Sun, Jun 21, 2009 at 9:32 AM, <Br...@nokia.com> wrote:
> >
> > > IMHO, you should never rely on finalizers to release scarce resources
> > since
> > > you don't know when the finalizer will get called, if ever.
> > >
> > > -brian
> > >
> > >
> > >
> > > -----Original Message-----
> > > From: ext jason hadoop [mailto:jason.hadoop@gmail.com]
> > > Sent: Sunday, June 21, 2009 11:19 AM
> > > To: core-user@hadoop.apache.org
> > > Subject: Re: "Too many open files" error, which gets resolved after
> some
> > > time
> > >
> > > HDFS/DFS client uses quite a few file descriptors for each open file.
> > >
> > > Many application developers (but not the hadoop core) rely on the JVM
> > > finalizer methods to close open files.
> > >
> > > This combination, expecially when many HDFS files are open can result
> in
> > > very large demands for file descriptors for Hadoop clients.
> > > We as a general rule never run a cluster with nofile less that 64k, and
> > for
> > > larger clusters with demanding applications have had it set 10x higher.
> I
> > > also believe there was a set of JVM versions that leaked file
> descriptors
> > > used for NIO in the HDFS core. I do not recall the exact details.
> > >
> > > On Sun, Jun 21, 2009 at 5:27 AM, Stas Oskin <st...@gmail.com>
> > wrote:
> > >
> > > > Hi.
> > > >
> > > > After tracing some more with the lsof utility, and I managed to stop
> > the
> > > > growth on the DataNode process, but still have issues with my DFS
> > client.
> > > >
> > > > It seems that my DFS client opens hundreds of pipes and eventpolls.
> > Here
> > > is
> > > > a small part of the lsof output:
> > > >
> > > > java    10508 root  387w  FIFO                0,6           6142565
> > pipe
> > > > java    10508 root  388r  FIFO                0,6           6142565
> > pipe
> > > > java    10508 root  389u  0000               0,10        0  6142566
> > > > eventpoll
> > > > java    10508 root  390u  FIFO                0,6           6135311
> > pipe
> > > > java    10508 root  391r  FIFO                0,6           6135311
> > pipe
> > > > java    10508 root  392u  0000               0,10        0  6135312
> > > > eventpoll
> > > > java    10508 root  393r  FIFO                0,6           6148234
> > pipe
> > > > java    10508 root  394w  FIFO                0,6           6142570
> > pipe
> > > > java    10508 root  395r  FIFO                0,6           6135857
> > pipe
> > > > java    10508 root  396r  FIFO                0,6           6142570
> > pipe
> > > > java    10508 root  397r  0000               0,10        0  6142571
> > > > eventpoll
> > > > java    10508 root  398u  FIFO                0,6           6135319
> > pipe
> > > > java    10508 root  399w  FIFO                0,6           6135319
> > pipe
> > > >
> > > > I'm using FSDataInputStream and FSDataOutputStream, so this might be
> > > > related
> > > > to pipes?
> > > >
> > > > So, my questions are:
> > > >
> > > > 1) What happens these pipes/epolls to appear?
> > > >
> > > > 2) More important, how I can prevent their accumation and growth?
> > > >
> > > > Thanks in advance!
> > > >
> > > > 2009/6/21 Stas Oskin <st...@gmail.com>
> > > >
> > > > > Hi.
> > > > >
> > > > > I have HDFS client and HDFS datanode running on same machine.
> > > > >
> > > > > When I'm trying to access a dozen of files at once from the client,
> > > > several
> > > > > times in a row, I'm starting to receive the following errors on
> > client,
> > > > and
> > > > > HDFS browse function.
> > > > >
> > > > > HDFS Client: "Could not get block locations. Aborting..."
> > > > > HDFS browse: "Too many open files"
> > > > >
> > > > > I can increase the maximum number of files that can opened, as I
> have
> > > it
> > > > > set to the default 1024, but would like to first solve the problem,
> > as
> > > > > larger value just means it would run out of files again later on.
> > > > >
> > > > > So my questions are:
> > > > >
> > > > > 1) Does the HDFS datanode keeps any files opened, even after the
> HDFS
> > > > > client have already closed them?
> > > > >
> > > > > 2) Is it possible to find out, who keeps the opened files -
> datanode
> > or
> > > > > client (so I could pin-point the source of the problem).
> > > > >
> > > > > Thanks in advance!
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Pro Hadoop, a book to guide you from beginner to hadoop mastery,
> > > http://www.amazon.com/dp/1430219424?tag=jewlerymall
> > > www.prohadoopbook.com a community for Hadoop Professionals
> > >
> >
> >
> >
> > --
> > Pro Hadoop, a book to guide you from beginner to hadoop mastery,
> > http://www.amazon.com/dp/1430219424?tag=jewlerymall
> > www.prohadoopbook.com a community for Hadoop Professionals
> >
>



-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals

Re: "Too many open files" error, which gets resolved after some time

Posted by Stas Oskin <st...@gmail.com>.

Hi.

Thanks for the advice. So you advice explicitly closing each and every file
handle that I receive from HDFS?

Regards.

2009/6/21 jason hadoop <ja...@gmail.com>

> Just to be clear, I second Brian's opinion. Relying on finalizes is a very
> good way to run out of file descriptors.
>
> On Sun, Jun 21, 2009 at 9:32 AM, <Br...@nokia.com> wrote:
>
> > IMHO, you should never rely on finalizers to release scarce resources
> since
> > you don't know when the finalizer will get called, if ever.
> >
> > -brian
> >
> >
> >
> > -----Original Message-----
> > From: ext jason hadoop [mailto:jason.hadoop@gmail.com]
> > Sent: Sunday, June 21, 2009 11:19 AM
> > To: core-user@hadoop.apache.org
> > Subject: Re: "Too many open files" error, which gets resolved after some
> > time
> >
> > HDFS/DFS client uses quite a few file descriptors for each open file.
> >
> > Many application developers (but not the hadoop core) rely on the JVM
> > finalizer methods to close open files.
> >
> > This combination, expecially when many HDFS files are open can result in
> > very large demands for file descriptors for Hadoop clients.
> > We as a general rule never run a cluster with nofile less that 64k, and
> for
> > larger clusters with demanding applications have had it set 10x higher. I
> > also believe there was a set of JVM versions that leaked file descriptors
> > used for NIO in the HDFS core. I do not recall the exact details.
> >
> > On Sun, Jun 21, 2009 at 5:27 AM, Stas Oskin <st...@gmail.com>
> wrote:
> >
> > > Hi.
> > >
> > > After tracing some more with the lsof utility, and I managed to stop
> the
> > > growth on the DataNode process, but still have issues with my DFS
> client.
> > >
> > > It seems that my DFS client opens hundreds of pipes and eventpolls.
> Here
> > is
> > > a small part of the lsof output:
> > >
> > > java    10508 root  387w  FIFO                0,6           6142565
> pipe
> > > java    10508 root  388r  FIFO                0,6           6142565
> pipe
> > > java    10508 root  389u  0000               0,10        0  6142566
> > > eventpoll
> > > java    10508 root  390u  FIFO                0,6           6135311
> pipe
> > > java    10508 root  391r  FIFO                0,6           6135311
> pipe
> > > java    10508 root  392u  0000               0,10        0  6135312
> > > eventpoll
> > > java    10508 root  393r  FIFO                0,6           6148234
> pipe
> > > java    10508 root  394w  FIFO                0,6           6142570
> pipe
> > > java    10508 root  395r  FIFO                0,6           6135857
> pipe
> > > java    10508 root  396r  FIFO                0,6           6142570
> pipe
> > > java    10508 root  397r  0000               0,10        0  6142571
> > > eventpoll
> > > java    10508 root  398u  FIFO                0,6           6135319
> pipe
> > > java    10508 root  399w  FIFO                0,6           6135319
> pipe
> > >
> > > I'm using FSDataInputStream and FSDataOutputStream, so this might be
> > > related
> > > to pipes?
> > >
> > > So, my questions are:
> > >
> > > 1) What happens these pipes/epolls to appear?
> > >
> > > 2) More important, how I can prevent their accumation and growth?
> > >
> > > Thanks in advance!
> > >
> > > 2009/6/21 Stas Oskin <st...@gmail.com>
> > >
> > > > Hi.
> > > >
> > > > I have HDFS client and HDFS datanode running on same machine.
> > > >
> > > > When I'm trying to access a dozen of files at once from the client,
> > > several
> > > > times in a row, I'm starting to receive the following errors on
> client,
> > > and
> > > > HDFS browse function.
> > > >
> > > > HDFS Client: "Could not get block locations. Aborting..."
> > > > HDFS browse: "Too many open files"
> > > >
> > > > I can increase the maximum number of files that can opened, as I have
> > it
> > > > set to the default 1024, but would like to first solve the problem,
> as
> > > > larger value just means it would run out of files again later on.
> > > >
> > > > So my questions are:
> > > >
> > > > 1) Does the HDFS datanode keeps any files opened, even after the HDFS
> > > > client have already closed them?
> > > >
> > > > 2) Is it possible to find out, who keeps the opened files - datanode
> or
> > > > client (so I could pin-point the source of the problem).
> > > >
> > > > Thanks in advance!
> > > >
> > >
> >
> >
> >
> > --
> > Pro Hadoop, a book to guide you from beginner to hadoop mastery,
> > http://www.amazon.com/dp/1430219424?tag=jewlerymall
> > www.prohadoopbook.com a community for Hadoop Professionals
> >
>
>
>
> --
> Pro Hadoop, a book to guide you from beginner to hadoop mastery,
> http://www.amazon.com/dp/1430219424?tag=jewlerymall
> www.prohadoopbook.com a community for Hadoop Professionals
>

Re: "Too many open files" error, which gets resolved after some time

Posted by jason hadoop <ja...@gmail.com>.

Just to be clear, I second Brian's opinion. Relying on finalizes is a very
good way to run out of file descriptors.

On Sun, Jun 21, 2009 at 9:32 AM, <Br...@nokia.com> wrote:

> IMHO, you should never rely on finalizers to release scarce resources since
> you don't know when the finalizer will get called, if ever.
>
> -brian
>
>
>
> -----Original Message-----
> From: ext jason hadoop [mailto:jason.hadoop@gmail.com]
> Sent: Sunday, June 21, 2009 11:19 AM
> To: core-user@hadoop.apache.org
> Subject: Re: "Too many open files" error, which gets resolved after some
> time
>
> HDFS/DFS client uses quite a few file descriptors for each open file.
>
> Many application developers (but not the hadoop core) rely on the JVM
> finalizer methods to close open files.
>
> This combination, expecially when many HDFS files are open can result in
> very large demands for file descriptors for Hadoop clients.
> We as a general rule never run a cluster with nofile less that 64k, and for
> larger clusters with demanding applications have had it set 10x higher. I
> also believe there was a set of JVM versions that leaked file descriptors
> used for NIO in the HDFS core. I do not recall the exact details.
>
> On Sun, Jun 21, 2009 at 5:27 AM, Stas Oskin <st...@gmail.com> wrote:
>
> > Hi.
> >
> > After tracing some more with the lsof utility, and I managed to stop the
> > growth on the DataNode process, but still have issues with my DFS client.
> >
> > It seems that my DFS client opens hundreds of pipes and eventpolls. Here
> is
> > a small part of the lsof output:
> >
> > java    10508 root  387w  FIFO                0,6           6142565 pipe
> > java    10508 root  388r  FIFO                0,6           6142565 pipe
> > java    10508 root  389u  0000               0,10        0  6142566
> > eventpoll
> > java    10508 root  390u  FIFO                0,6           6135311 pipe
> > java    10508 root  391r  FIFO                0,6           6135311 pipe
> > java    10508 root  392u  0000               0,10        0  6135312
> > eventpoll
> > java    10508 root  393r  FIFO                0,6           6148234 pipe
> > java    10508 root  394w  FIFO                0,6           6142570 pipe
> > java    10508 root  395r  FIFO                0,6           6135857 pipe
> > java    10508 root  396r  FIFO                0,6           6142570 pipe
> > java    10508 root  397r  0000               0,10        0  6142571
> > eventpoll
> > java    10508 root  398u  FIFO                0,6           6135319 pipe
> > java    10508 root  399w  FIFO                0,6           6135319 pipe
> >
> > I'm using FSDataInputStream and FSDataOutputStream, so this might be
> > related
> > to pipes?
> >
> > So, my questions are:
> >
> > 1) What happens these pipes/epolls to appear?
> >
> > 2) More important, how I can prevent their accumation and growth?
> >
> > Thanks in advance!
> >
> > 2009/6/21 Stas Oskin <st...@gmail.com>
> >
> > > Hi.
> > >
> > > I have HDFS client and HDFS datanode running on same machine.
> > >
> > > When I'm trying to access a dozen of files at once from the client,
> > several
> > > times in a row, I'm starting to receive the following errors on client,
> > and
> > > HDFS browse function.
> > >
> > > HDFS Client: "Could not get block locations. Aborting..."
> > > HDFS browse: "Too many open files"
> > >
> > > I can increase the maximum number of files that can opened, as I have
> it
> > > set to the default 1024, but would like to first solve the problem, as
> > > larger value just means it would run out of files again later on.
> > >
> > > So my questions are:
> > >
> > > 1) Does the HDFS datanode keeps any files opened, even after the HDFS
> > > client have already closed them?
> > >
> > > 2) Is it possible to find out, who keeps the opened files - datanode or
> > > client (so I could pin-point the source of the problem).
> > >
> > > Thanks in advance!
> > >
> >
>
>
>
> --
> Pro Hadoop, a book to guide you from beginner to hadoop mastery,
> http://www.amazon.com/dp/1430219424?tag=jewlerymall
> www.prohadoopbook.com a community for Hadoop Professionals
>



-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals

Re: "Too many open files" error, which gets resolved after some time

Posted by Steve Loughran <st...@apache.org>.

Scott Carey wrote:
> Furthermore, if for some reason it is required to dispose of any objects after others are GC'd, weak references and a weak reference queue will perform significantly better in throughput and latency - orders of magnitude better - than finalizers.
> 


Good point.

I would make sense for the FileSystem cache to be weak referenced, so 
that on long-lived processes the client references will get cleaned up 
without waiting for app termination

Re: "Too many open files" error, which gets resolved after some time

Posted by Scott Carey <sc...@richrelevance.com>.

Furthermore, if for some reason it is required to dispose of any objects after others are GC'd, weak references and a weak reference queue will perform significantly better in throughput and latency - orders of magnitude better - than finalizers.

On 6/21/09 9:32 AM, "Brian.Levine@nokia.com" <Br...@nokia.com> wrote:

IMHO, you should never rely on finalizers to release scarce resources since you don't know when the finalizer will get called, if ever.

-brian



-----Original Message-----
From: ext jason hadoop [mailto:jason.hadoop@gmail.com]
Sent: Sunday, June 21, 2009 11:19 AM
To: core-user@hadoop.apache.org
Subject: Re: "Too many open files" error, which gets resolved after some time

HDFS/DFS client uses quite a few file descriptors for each open file.

Many application developers (but not the hadoop core) rely on the JVM
finalizer methods to close open files.

This combination, expecially when many HDFS files are open can result in
very large demands for file descriptors for Hadoop clients.
We as a general rule never run a cluster with nofile less that 64k, and for
larger clusters with demanding applications have had it set 10x higher. I
also believe there was a set of JVM versions that leaked file descriptors
used for NIO in the HDFS core. I do not recall the exact details.

On Sun, Jun 21, 2009 at 5:27 AM, Stas Oskin <st...@gmail.com> wrote:

> Hi.
>
> After tracing some more with the lsof utility, and I managed to stop the
> growth on the DataNode process, but still have issues with my DFS client.
>
> It seems that my DFS client opens hundreds of pipes and eventpolls. Here is
> a small part of the lsof output:
>
> java    10508 root  387w  FIFO                0,6           6142565 pipe
> java    10508 root  388r  FIFO                0,6           6142565 pipe
> java    10508 root  389u  0000               0,10        0  6142566
> eventpoll
> java    10508 root  390u  FIFO                0,6           6135311 pipe
> java    10508 root  391r  FIFO                0,6           6135311 pipe
> java    10508 root  392u  0000               0,10        0  6135312
> eventpoll
> java    10508 root  393r  FIFO                0,6           6148234 pipe
> java    10508 root  394w  FIFO                0,6           6142570 pipe
> java    10508 root  395r  FIFO                0,6           6135857 pipe
> java    10508 root  396r  FIFO                0,6           6142570 pipe
> java    10508 root  397r  0000               0,10        0  6142571
> eventpoll
> java    10508 root  398u  FIFO                0,6           6135319 pipe
> java    10508 root  399w  FIFO                0,6           6135319 pipe
>
> I'm using FSDataInputStream and FSDataOutputStream, so this might be
> related
> to pipes?
>
> So, my questions are:
>
> 1) What happens these pipes/epolls to appear?
>
> 2) More important, how I can prevent their accumation and growth?
>
> Thanks in advance!
>
> 2009/6/21 Stas Oskin <st...@gmail.com>
>
> > Hi.
> >
> > I have HDFS client and HDFS datanode running on same machine.
> >
> > When I'm trying to access a dozen of files at once from the client,
> several
> > times in a row, I'm starting to receive the following errors on client,
> and
> > HDFS browse function.
> >
> > HDFS Client: "Could not get block locations. Aborting..."
> > HDFS browse: "Too many open files"
> >
> > I can increase the maximum number of files that can opened, as I have it
> > set to the default 1024, but would like to first solve the problem, as
> > larger value just means it would run out of files again later on.
> >
> > So my questions are:
> >
> > 1) Does the HDFS datanode keeps any files opened, even after the HDFS
> > client have already closed them?
> >
> > 2) Is it possible to find out, who keeps the opened files - datanode or
> > client (so I could pin-point the source of the problem).
> >
> > Thanks in advance!
> >
>



--
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals

RE: "Too many open files" error, which gets resolved after some time

Posted by Br...@nokia.com.

IMHO, you should never rely on finalizers to release scarce resources since you don't know when the finalizer will get called, if ever.

-brian

 

-----Original Message-----
From: ext jason hadoop [mailto:jason.hadoop@gmail.com] 
Sent: Sunday, June 21, 2009 11:19 AM
To: core-user@hadoop.apache.org
Subject: Re: "Too many open files" error, which gets resolved after some time

HDFS/DFS client uses quite a few file descriptors for each open file.

Many application developers (but not the hadoop core) rely on the JVM
finalizer methods to close open files.

This combination, expecially when many HDFS files are open can result in
very large demands for file descriptors for Hadoop clients.
We as a general rule never run a cluster with nofile less that 64k, and for
larger clusters with demanding applications have had it set 10x higher. I
also believe there was a set of JVM versions that leaked file descriptors
used for NIO in the HDFS core. I do not recall the exact details.

On Sun, Jun 21, 2009 at 5:27 AM, Stas Oskin <st...@gmail.com> wrote:

> Hi.
>
> After tracing some more with the lsof utility, and I managed to stop the
> growth on the DataNode process, but still have issues with my DFS client.
>
> It seems that my DFS client opens hundreds of pipes and eventpolls. Here is
> a small part of the lsof output:
>
> java    10508 root  387w  FIFO                0,6           6142565 pipe
> java    10508 root  388r  FIFO                0,6           6142565 pipe
> java    10508 root  389u  0000               0,10        0  6142566
> eventpoll
> java    10508 root  390u  FIFO                0,6           6135311 pipe
> java    10508 root  391r  FIFO                0,6           6135311 pipe
> java    10508 root  392u  0000               0,10        0  6135312
> eventpoll
> java    10508 root  393r  FIFO                0,6           6148234 pipe
> java    10508 root  394w  FIFO                0,6           6142570 pipe
> java    10508 root  395r  FIFO                0,6           6135857 pipe
> java    10508 root  396r  FIFO                0,6           6142570 pipe
> java    10508 root  397r  0000               0,10        0  6142571
> eventpoll
> java    10508 root  398u  FIFO                0,6           6135319 pipe
> java    10508 root  399w  FIFO                0,6           6135319 pipe
>
> I'm using FSDataInputStream and FSDataOutputStream, so this might be
> related
> to pipes?
>
> So, my questions are:
>
> 1) What happens these pipes/epolls to appear?
>
> 2) More important, how I can prevent their accumation and growth?
>
> Thanks in advance!
>
> 2009/6/21 Stas Oskin <st...@gmail.com>
>
> > Hi.
> >
> > I have HDFS client and HDFS datanode running on same machine.
> >
> > When I'm trying to access a dozen of files at once from the client,
> several
> > times in a row, I'm starting to receive the following errors on client,
> and
> > HDFS browse function.
> >
> > HDFS Client: "Could not get block locations. Aborting..."
> > HDFS browse: "Too many open files"
> >
> > I can increase the maximum number of files that can opened, as I have it
> > set to the default 1024, but would like to first solve the problem, as
> > larger value just means it would run out of files again later on.
> >
> > So my questions are:
> >
> > 1) Does the HDFS datanode keeps any files opened, even after the HDFS
> > client have already closed them?
> >
> > 2) Is it possible to find out, who keeps the opened files - datanode or
> > client (so I could pin-point the source of the problem).
> >
> > Thanks in advance!
> >
>



-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals

Re: "Too many open files" error, which gets resolved after some time

Posted by jason hadoop <ja...@gmail.com>.

HDFS/DFS client uses quite a few file descriptors for each open file.

Many application developers (but not the hadoop core) rely on the JVM
finalizer methods to close open files.

This combination, expecially when many HDFS files are open can result in
very large demands for file descriptors for Hadoop clients.
We as a general rule never run a cluster with nofile less that 64k, and for
larger clusters with demanding applications have had it set 10x higher. I
also believe there was a set of JVM versions that leaked file descriptors
used for NIO in the HDFS core. I do not recall the exact details.

On Sun, Jun 21, 2009 at 5:27 AM, Stas Oskin <st...@gmail.com> wrote:

> Hi.
>
> After tracing some more with the lsof utility, and I managed to stop the
> growth on the DataNode process, but still have issues with my DFS client.
>
> It seems that my DFS client opens hundreds of pipes and eventpolls. Here is
> a small part of the lsof output:
>
> java    10508 root  387w  FIFO                0,6           6142565 pipe
> java    10508 root  388r  FIFO                0,6           6142565 pipe
> java    10508 root  389u  0000               0,10        0  6142566
> eventpoll
> java    10508 root  390u  FIFO                0,6           6135311 pipe
> java    10508 root  391r  FIFO                0,6           6135311 pipe
> java    10508 root  392u  0000               0,10        0  6135312
> eventpoll
> java    10508 root  393r  FIFO                0,6           6148234 pipe
> java    10508 root  394w  FIFO                0,6           6142570 pipe
> java    10508 root  395r  FIFO                0,6           6135857 pipe
> java    10508 root  396r  FIFO                0,6           6142570 pipe
> java    10508 root  397r  0000               0,10        0  6142571
> eventpoll
> java    10508 root  398u  FIFO                0,6           6135319 pipe
> java    10508 root  399w  FIFO                0,6           6135319 pipe
>
> I'm using FSDataInputStream and FSDataOutputStream, so this might be
> related
> to pipes?
>
> So, my questions are:
>
> 1) What happens these pipes/epolls to appear?
>
> 2) More important, how I can prevent their accumation and growth?
>
> Thanks in advance!
>
> 2009/6/21 Stas Oskin <st...@gmail.com>
>
> > Hi.
> >
> > I have HDFS client and HDFS datanode running on same machine.
> >
> > When I'm trying to access a dozen of files at once from the client,
> several
> > times in a row, I'm starting to receive the following errors on client,
> and
> > HDFS browse function.
> >
> > HDFS Client: "Could not get block locations. Aborting..."
> > HDFS browse: "Too many open files"
> >
> > I can increase the maximum number of files that can opened, as I have it
> > set to the default 1024, but would like to first solve the problem, as
> > larger value just means it would run out of files again later on.
> >
> > So my questions are:
> >
> > 1) Does the HDFS datanode keeps any files opened, even after the HDFS
> > client have already closed them?
> >
> > 2) Is it possible to find out, who keeps the opened files - datanode or
> > client (so I could pin-point the source of the problem).
> >
> > Thanks in advance!
> >
>



-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals

Re: "Too many open files" error, which gets resolved after some time

Posted by Stas Oskin <st...@gmail.com>.

Hi Raghu.

Thanks for the clarification and for explaining the potential issue.

It is not just the fds, the applications that hit fd limits hit thread
> limits as well. Obviously Hadoop can not sustain this as the range of
> applications increases. It will be fixed one way or the other.
>

Can you please clarify the thread limit matter?

AFAIK it only happens if the allocated stack too large, and we speak about
thousands of threads ( a possible solution described here:
http://candrews.integralblue.com/2009/01/preventing-outofmemoryerror-native-thread/
).

So how it's tied to fd's?

Thanks.

Re: "Too many open files" error, which gets resolved after some time

Posted by Raghu Angadi <ra...@yahoo-inc.com>.

Stas Oskin wrote:
> Hi.
> 
> Thanks for the explanation.
> 
> Just to clarify, the extra thread waiting on writing, is happens in
> multi-threading as well?
> 
> Meaning if I have 10 writing threads for example, it would be actually 70
> fd's?

unfortunately, yes.

There are different proposals to fix this : async I/O in Hadoop, RPCs 
for data transfers.

It is not just the fds, the applications that hit fd limits hit thread 
limits as well. Obviously Hadoop can not sustain this as the range of 
applications increases. It will be fixed one way or the other.

Raghu.

> Regards.
> 
> 2009/8/3 Raghu Angadi <ra...@yahoo-inc.com>
> 
>> For writes, there is an extra thread waiting on i/o. So it would be 3 fds
>> more. To simplify earlier equation, on the client side :
>>
>> for writes :  max fds (for io bound load) = 7 * #write_streams
>> for reads  :  max fds (for io bound load) = 4 * #read_streams
>>
>> The main socket is cleared as soon as you close the stream.
>> The rest of fds stay for 10 sec (they get reused if you open more streams
>> meanwhile).
>>
>> I hope this is clear enough.
>>
>>
>> Raghu.
>>
>> Stas Oskin wrote:
>>
>>> Hi.
>>>
>>> I'd like to raise this issue once again, just to clarify a point.
>>>
>>> If I have only one thread writing to HDFS, the amount of fd's should be 4,
>>> resulting from:
>>>
>>> 1) input
>>> 2) output
>>> 3) epoll
>>> 4) stream itself
>>>
>>> And these 4 fds should be cleared out after 10 seconds.
>>>
>>> Is this correct?
>>>
>>> Thanks in advance for the information!
>>>
>>> 2009/6/24 Stas Oskin <st...@gmail.com>
>>>
>>>  Hi.
>>>> So if I open one stream, it should be 4?
>>>>
>>>>
>>>>
>>>> 2009/6/23 Raghu Angadi <ra...@yahoo-inc.com>
>>>>
>>>>  how many threads do you have? Number of active threads is very
>>>>> important.
>>>>> Normally,
>>>>>
>>>>> #fds = (3 * #threads_blocked_on_io) + #streams
>>>>>
>>>>> 12 per stream is certainly way off.
>>>>>
>>>>> Raghu.
>>>>>
>>>>>
>>>>> Stas Oskin wrote:
>>>>>
>>>>>  Hi.
>>>>>> In my case it was actually ~ 12 fd's per stream, which included pipes
>>>>>> and
>>>>>> epolls.
>>>>>>
>>>>>> Could it be that HDFS opens 3 x 3 (input - output - epoll) fd's per
>>>>>> each
>>>>>> thread, which make it close to the number I mentioned? Or it always 3
>>>>>> at
>>>>>> maximum per thread / stream?
>>>>>>
>>>>>> Up to 10 sec looks quite the correct number, it seems it gets freed
>>>>>> arround
>>>>>> this time indeed.
>>>>>>
>>>>>> Regards.
>>>>>>
>>>>>> 2009/6/23 Raghu Angadi <ra...@yahoo-inc.com>
>>>>>>
>>>>>>  To be more accurate, once you have HADOOP-4346,
>>>>>>
>>>>>>> fds for epoll and pipes = 3 * threads blocked on Hadoop I/O
>>>>>>>
>>>>>>> Unless you have hundreds of threads at a time, you should not see
>>>>>>> hundreds
>>>>>>> of these. These fds stay up to 10sec even after the
>>>>>>> threads exit.
>>>>>>>
>>>>>>> I am a bit confused about your exact situation. Please check number of
>>>>>>> threads if you still facing the problem.
>>>>>>>
>>>>>>> Raghu.
>>>>>>>
>>>>>>>
>>>>>>> Raghu Angadi wrote:
>>>>>>>
>>>>>>>  since you have HADOOP-4346, you should not have excessive epoll/pipe
>>>>>>>
>>>>>>>> fds
>>>>>>>> open. First of all do you still have the problem? If yes, how many
>>>>>>>> hadoop
>>>>>>>> streams do you have at a time?
>>>>>>>>
>>>>>>>> System.gc() won't help if you have HADOOP-4346.
>>>>>>>>
>>>>>>>> Ragu.
>>>>>>>>
>>>>>>>>  Thanks for your opinion!
>>>>>>>>
>>>>>>>>  2009/6/22 Stas Oskin <st...@gmail.com>
>>>>>>>>>  Ok, seems this issue is already patched in the Hadoop distro I'm
>>>>>>>>> using
>>>>>>>>>
>>>>>>>>>  (Cloudera).
>>>>>>>>>> Any idea if I still should call GC manually/periodically to clean
>>>>>>>>>> out
>>>>>>>>>> all
>>>>>>>>>> the stale pipes / epolls?
>>>>>>>>>>
>>>>>>>>>> 2009/6/22 Steve Loughran <st...@apache.org>
>>>>>>>>>>
>>>>>>>>>>  Stas Oskin wrote:
>>>>>>>>>>
>>>>>>>>>>   Hi.
>>>>>>>>>>>  So what would be the recommended approach to pre-0.20.x series?
>>>>>>>>>>>
>>>>>>>>>>>> To insure each file is used only by one thread, and then it safe
>>>>>>>>>>>> to
>>>>>>>>>>>> close
>>>>>>>>>>>> the handle in that thread?
>>>>>>>>>>>>
>>>>>>>>>>>> Regards.
>>>>>>>>>>>>
>>>>>>>>>>>>  good question -I'm not sure. For anythiong you get with
>>>>>>>>>>>>
>>>>>>>>>>>>  FileSystem.get(),
>>>>>>>>>>> its now dangerous to close, so try just setting the reference to
>>>>>>>>>>> null
>>>>>>>>>>> and
>>>>>>>>>>> hoping that GC will do the finalize() when needed
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>

Re: "Too many open files" error, which gets resolved after some time

Posted by Stas Oskin <st...@gmail.com>.

Hi.

Thanks for the explanation.

Just to clarify, the extra thread waiting on writing, is happens in
multi-threading as well?

Meaning if I have 10 writing threads for example, it would be actually 70
fd's?

Regards.

2009/8/3 Raghu Angadi <ra...@yahoo-inc.com>

> For writes, there is an extra thread waiting on i/o. So it would be 3 fds
> more. To simplify earlier equation, on the client side :
>
> for writes :  max fds (for io bound load) = 7 * #write_streams
> for reads  :  max fds (for io bound load) = 4 * #read_streams
>
> The main socket is cleared as soon as you close the stream.
> The rest of fds stay for 10 sec (they get reused if you open more streams
> meanwhile).
>
> I hope this is clear enough.
>
>
> Raghu.
>
> Stas Oskin wrote:
>
>> Hi.
>>
>> I'd like to raise this issue once again, just to clarify a point.
>>
>> If I have only one thread writing to HDFS, the amount of fd's should be 4,
>> resulting from:
>>
>> 1) input
>> 2) output
>> 3) epoll
>> 4) stream itself
>>
>> And these 4 fds should be cleared out after 10 seconds.
>>
>> Is this correct?
>>
>> Thanks in advance for the information!
>>
>> 2009/6/24 Stas Oskin <st...@gmail.com>
>>
>>  Hi.
>>>
>>> So if I open one stream, it should be 4?
>>>
>>>
>>>
>>> 2009/6/23 Raghu Angadi <ra...@yahoo-inc.com>
>>>
>>>  how many threads do you have? Number of active threads is very
>>>> important.
>>>> Normally,
>>>>
>>>> #fds = (3 * #threads_blocked_on_io) + #streams
>>>>
>>>> 12 per stream is certainly way off.
>>>>
>>>> Raghu.
>>>>
>>>>
>>>> Stas Oskin wrote:
>>>>
>>>>  Hi.
>>>>>
>>>>> In my case it was actually ~ 12 fd's per stream, which included pipes
>>>>> and
>>>>> epolls.
>>>>>
>>>>> Could it be that HDFS opens 3 x 3 (input - output - epoll) fd's per
>>>>> each
>>>>> thread, which make it close to the number I mentioned? Or it always 3
>>>>> at
>>>>> maximum per thread / stream?
>>>>>
>>>>> Up to 10 sec looks quite the correct number, it seems it gets freed
>>>>> arround
>>>>> this time indeed.
>>>>>
>>>>> Regards.
>>>>>
>>>>> 2009/6/23 Raghu Angadi <ra...@yahoo-inc.com>
>>>>>
>>>>>  To be more accurate, once you have HADOOP-4346,
>>>>>
>>>>>> fds for epoll and pipes = 3 * threads blocked on Hadoop I/O
>>>>>>
>>>>>> Unless you have hundreds of threads at a time, you should not see
>>>>>> hundreds
>>>>>> of these. These fds stay up to 10sec even after the
>>>>>> threads exit.
>>>>>>
>>>>>> I am a bit confused about your exact situation. Please check number of
>>>>>> threads if you still facing the problem.
>>>>>>
>>>>>> Raghu.
>>>>>>
>>>>>>
>>>>>> Raghu Angadi wrote:
>>>>>>
>>>>>>  since you have HADOOP-4346, you should not have excessive epoll/pipe
>>>>>>
>>>>>>> fds
>>>>>>> open. First of all do you still have the problem? If yes, how many
>>>>>>> hadoop
>>>>>>> streams do you have at a time?
>>>>>>>
>>>>>>> System.gc() won't help if you have HADOOP-4346.
>>>>>>>
>>>>>>> Ragu.
>>>>>>>
>>>>>>>  Thanks for your opinion!
>>>>>>>
>>>>>>>  2009/6/22 Stas Oskin <st...@gmail.com>
>>>>>>>>
>>>>>>>>  Ok, seems this issue is already patched in the Hadoop distro I'm
>>>>>>>> using
>>>>>>>>
>>>>>>>>  (Cloudera).
>>>>>>>>>
>>>>>>>>> Any idea if I still should call GC manually/periodically to clean
>>>>>>>>> out
>>>>>>>>> all
>>>>>>>>> the stale pipes / epolls?
>>>>>>>>>
>>>>>>>>> 2009/6/22 Steve Loughran <st...@apache.org>
>>>>>>>>>
>>>>>>>>>  Stas Oskin wrote:
>>>>>>>>>
>>>>>>>>>   Hi.
>>>>>>>>>>
>>>>>>>>>>  So what would be the recommended approach to pre-0.20.x series?
>>>>>>>>>>
>>>>>>>>>>> To insure each file is used only by one thread, and then it safe
>>>>>>>>>>> to
>>>>>>>>>>> close
>>>>>>>>>>> the handle in that thread?
>>>>>>>>>>>
>>>>>>>>>>> Regards.
>>>>>>>>>>>
>>>>>>>>>>>  good question -I'm not sure. For anythiong you get with
>>>>>>>>>>>
>>>>>>>>>>>  FileSystem.get(),
>>>>>>>>>> its now dangerous to close, so try just setting the reference to
>>>>>>>>>> null
>>>>>>>>>> and
>>>>>>>>>> hoping that GC will do the finalize() when needed
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>
>

Re: "Too many open files" error, which gets resolved after some time

Posted by Raghu Angadi <ra...@yahoo-inc.com>.

For writes, there is an extra thread waiting on i/o. So it would be 3 
fds more. To simplify earlier equation, on the client side :

for writes :  max fds (for io bound load) = 7 * #write_streams
for reads  :  max fds (for io bound load) = 4 * #read_streams

The main socket is cleared as soon as you close the stream.
The rest of fds stay for 10 sec (they get reused if you open more 
streams meanwhile).

I hope this is clear enough.

Raghu.

Stas Oskin wrote:
> Hi.
> 
> I'd like to raise this issue once again, just to clarify a point.
> 
> If I have only one thread writing to HDFS, the amount of fd's should be 4,
> resulting from:
> 
> 1) input
> 2) output
> 3) epoll
> 4) stream itself
> 
> And these 4 fds should be cleared out after 10 seconds.
> 
> Is this correct?
> 
> Thanks in advance for the information!
> 
> 2009/6/24 Stas Oskin <st...@gmail.com>
> 
>> Hi.
>>
>> So if I open one stream, it should be 4?
>>
>>
>>
>> 2009/6/23 Raghu Angadi <ra...@yahoo-inc.com>
>>
>>> how many threads do you have? Number of active threads is very important.
>>> Normally,
>>>
>>> #fds = (3 * #threads_blocked_on_io) + #streams
>>>
>>> 12 per stream is certainly way off.
>>>
>>> Raghu.
>>>
>>>
>>> Stas Oskin wrote:
>>>
>>>> Hi.
>>>>
>>>> In my case it was actually ~ 12 fd's per stream, which included pipes and
>>>> epolls.
>>>>
>>>> Could it be that HDFS opens 3 x 3 (input - output - epoll) fd's per each
>>>> thread, which make it close to the number I mentioned? Or it always 3 at
>>>> maximum per thread / stream?
>>>>
>>>> Up to 10 sec looks quite the correct number, it seems it gets freed
>>>> arround
>>>> this time indeed.
>>>>
>>>> Regards.
>>>>
>>>> 2009/6/23 Raghu Angadi <ra...@yahoo-inc.com>
>>>>
>>>>  To be more accurate, once you have HADOOP-4346,
>>>>> fds for epoll and pipes = 3 * threads blocked on Hadoop I/O
>>>>>
>>>>> Unless you have hundreds of threads at a time, you should not see
>>>>> hundreds
>>>>> of these. These fds stay up to 10sec even after the
>>>>> threads exit.
>>>>>
>>>>> I am a bit confused about your exact situation. Please check number of
>>>>> threads if you still facing the problem.
>>>>>
>>>>> Raghu.
>>>>>
>>>>>
>>>>> Raghu Angadi wrote:
>>>>>
>>>>>  since you have HADOOP-4346, you should not have excessive epoll/pipe
>>>>>> fds
>>>>>> open. First of all do you still have the problem? If yes, how many
>>>>>> hadoop
>>>>>> streams do you have at a time?
>>>>>>
>>>>>> System.gc() won't help if you have HADOOP-4346.
>>>>>>
>>>>>> Ragu.
>>>>>>
>>>>>>  Thanks for your opinion!
>>>>>>
>>>>>>> 2009/6/22 Stas Oskin <st...@gmail.com>
>>>>>>>
>>>>>>>  Ok, seems this issue is already patched in the Hadoop distro I'm
>>>>>>> using
>>>>>>>
>>>>>>>> (Cloudera).
>>>>>>>>
>>>>>>>> Any idea if I still should call GC manually/periodically to clean out
>>>>>>>> all
>>>>>>>> the stale pipes / epolls?
>>>>>>>>
>>>>>>>> 2009/6/22 Steve Loughran <st...@apache.org>
>>>>>>>>
>>>>>>>>  Stas Oskin wrote:
>>>>>>>>
>>>>>>>>>  Hi.
>>>>>>>>>
>>>>>>>>>  So what would be the recommended approach to pre-0.20.x series?
>>>>>>>>>> To insure each file is used only by one thread, and then it safe to
>>>>>>>>>> close
>>>>>>>>>> the handle in that thread?
>>>>>>>>>>
>>>>>>>>>> Regards.
>>>>>>>>>>
>>>>>>>>>>  good question -I'm not sure. For anythiong you get with
>>>>>>>>>>
>>>>>>>>> FileSystem.get(),
>>>>>>>>> its now dangerous to close, so try just setting the reference to
>>>>>>>>> null
>>>>>>>>> and
>>>>>>>>> hoping that GC will do the finalize() when needed
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>

Re: "Too many open files" error, which gets resolved after some time

Posted by Stas Oskin <st...@gmail.com>.

Hi.

I'd like to raise this issue once again, just to clarify a point.

If I have only one thread writing to HDFS, the amount of fd's should be 4,
resulting from:

1) input
2) output
3) epoll
4) stream itself

And these 4 fds should be cleared out after 10 seconds.

Is this correct?

Thanks in advance for the information!

2009/6/24 Stas Oskin <st...@gmail.com>

> Hi.
>
> So if I open one stream, it should be 4?
>
>
>
> 2009/6/23 Raghu Angadi <ra...@yahoo-inc.com>
>
>>
>> how many threads do you have? Number of active threads is very important.
>> Normally,
>>
>> #fds = (3 * #threads_blocked_on_io) + #streams
>>
>> 12 per stream is certainly way off.
>>
>> Raghu.
>>
>>
>> Stas Oskin wrote:
>>
>>> Hi.
>>>
>>> In my case it was actually ~ 12 fd's per stream, which included pipes and
>>> epolls.
>>>
>>> Could it be that HDFS opens 3 x 3 (input - output - epoll) fd's per each
>>> thread, which make it close to the number I mentioned? Or it always 3 at
>>> maximum per thread / stream?
>>>
>>> Up to 10 sec looks quite the correct number, it seems it gets freed
>>> arround
>>> this time indeed.
>>>
>>> Regards.
>>>
>>> 2009/6/23 Raghu Angadi <ra...@yahoo-inc.com>
>>>
>>>  To be more accurate, once you have HADOOP-4346,
>>>>
>>>> fds for epoll and pipes = 3 * threads blocked on Hadoop I/O
>>>>
>>>> Unless you have hundreds of threads at a time, you should not see
>>>> hundreds
>>>> of these. These fds stay up to 10sec even after the
>>>> threads exit.
>>>>
>>>> I am a bit confused about your exact situation. Please check number of
>>>> threads if you still facing the problem.
>>>>
>>>> Raghu.
>>>>
>>>>
>>>> Raghu Angadi wrote:
>>>>
>>>>  since you have HADOOP-4346, you should not have excessive epoll/pipe
>>>>> fds
>>>>> open. First of all do you still have the problem? If yes, how many
>>>>> hadoop
>>>>> streams do you have at a time?
>>>>>
>>>>> System.gc() won't help if you have HADOOP-4346.
>>>>>
>>>>> Ragu.
>>>>>
>>>>>  Thanks for your opinion!
>>>>>
>>>>>> 2009/6/22 Stas Oskin <st...@gmail.com>
>>>>>>
>>>>>>  Ok, seems this issue is already patched in the Hadoop distro I'm
>>>>>> using
>>>>>>
>>>>>>> (Cloudera).
>>>>>>>
>>>>>>> Any idea if I still should call GC manually/periodically to clean out
>>>>>>> all
>>>>>>> the stale pipes / epolls?
>>>>>>>
>>>>>>> 2009/6/22 Steve Loughran <st...@apache.org>
>>>>>>>
>>>>>>>  Stas Oskin wrote:
>>>>>>>
>>>>>>>>  Hi.
>>>>>>>>
>>>>>>>>  So what would be the recommended approach to pre-0.20.x series?
>>>>>>>>>
>>>>>>>>> To insure each file is used only by one thread, and then it safe to
>>>>>>>>> close
>>>>>>>>> the handle in that thread?
>>>>>>>>>
>>>>>>>>> Regards.
>>>>>>>>>
>>>>>>>>>  good question -I'm not sure. For anythiong you get with
>>>>>>>>>
>>>>>>>> FileSystem.get(),
>>>>>>>> its now dangerous to close, so try just setting the reference to
>>>>>>>> null
>>>>>>>> and
>>>>>>>> hoping that GC will do the finalize() when needed
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>
>>

Re: "Too many open files" error, which gets resolved after some time

Posted by Stas Oskin <st...@gmail.com>.

Hi.

So if I open one stream, it should be 4?


2009/6/23 Raghu Angadi <ra...@yahoo-inc.com>

>
> how many threads do you have? Number of active threads is very important.
> Normally,
>
> #fds = (3 * #threads_blocked_on_io) + #streams
>
> 12 per stream is certainly way off.
>
> Raghu.
>
>
> Stas Oskin wrote:
>
>> Hi.
>>
>> In my case it was actually ~ 12 fd's per stream, which included pipes and
>> epolls.
>>
>> Could it be that HDFS opens 3 x 3 (input - output - epoll) fd's per each
>> thread, which make it close to the number I mentioned? Or it always 3 at
>> maximum per thread / stream?
>>
>> Up to 10 sec looks quite the correct number, it seems it gets freed
>> arround
>> this time indeed.
>>
>> Regards.
>>
>> 2009/6/23 Raghu Angadi <ra...@yahoo-inc.com>
>>
>>  To be more accurate, once you have HADOOP-4346,
>>>
>>> fds for epoll and pipes = 3 * threads blocked on Hadoop I/O
>>>
>>> Unless you have hundreds of threads at a time, you should not see
>>> hundreds
>>> of these. These fds stay up to 10sec even after the
>>> threads exit.
>>>
>>> I am a bit confused about your exact situation. Please check number of
>>> threads if you still facing the problem.
>>>
>>> Raghu.
>>>
>>>
>>> Raghu Angadi wrote:
>>>
>>>  since you have HADOOP-4346, you should not have excessive epoll/pipe fds
>>>> open. First of all do you still have the problem? If yes, how many
>>>> hadoop
>>>> streams do you have at a time?
>>>>
>>>> System.gc() won't help if you have HADOOP-4346.
>>>>
>>>> Ragu.
>>>>
>>>>  Thanks for your opinion!
>>>>
>>>>> 2009/6/22 Stas Oskin <st...@gmail.com>
>>>>>
>>>>>  Ok, seems this issue is already patched in the Hadoop distro I'm using
>>>>>
>>>>>> (Cloudera).
>>>>>>
>>>>>> Any idea if I still should call GC manually/periodically to clean out
>>>>>> all
>>>>>> the stale pipes / epolls?
>>>>>>
>>>>>> 2009/6/22 Steve Loughran <st...@apache.org>
>>>>>>
>>>>>>  Stas Oskin wrote:
>>>>>>
>>>>>>>  Hi.
>>>>>>>
>>>>>>>  So what would be the recommended approach to pre-0.20.x series?
>>>>>>>>
>>>>>>>> To insure each file is used only by one thread, and then it safe to
>>>>>>>> close
>>>>>>>> the handle in that thread?
>>>>>>>>
>>>>>>>> Regards.
>>>>>>>>
>>>>>>>>  good question -I'm not sure. For anythiong you get with
>>>>>>>>
>>>>>>> FileSystem.get(),
>>>>>>> its now dangerous to close, so try just setting the reference to null
>>>>>>> and
>>>>>>> hoping that GC will do the finalize() when needed
>>>>>>>
>>>>>>>
>>>>>>>
>>
>

Re: "Too many open files" error, which gets resolved after some time

Posted by Raghu Angadi <ra...@yahoo-inc.com>.

how many threads do you have? Number of active threads is very 
important. Normally,

#fds = (3 * #threads_blocked_on_io) + #streams

12 per stream is certainly way off.

Raghu.

Stas Oskin wrote:
> Hi.
> 
> In my case it was actually ~ 12 fd's per stream, which included pipes and
> epolls.
> 
> Could it be that HDFS opens 3 x 3 (input - output - epoll) fd's per each
> thread, which make it close to the number I mentioned? Or it always 3 at
> maximum per thread / stream?
> 
> Up to 10 sec looks quite the correct number, it seems it gets freed arround
> this time indeed.
> 
> Regards.
> 
> 2009/6/23 Raghu Angadi <ra...@yahoo-inc.com>
> 
>> To be more accurate, once you have HADOOP-4346,
>>
>> fds for epoll and pipes = 3 * threads blocked on Hadoop I/O
>>
>> Unless you have hundreds of threads at a time, you should not see hundreds
>> of these. These fds stay up to 10sec even after the
>> threads exit.
>>
>> I am a bit confused about your exact situation. Please check number of
>> threads if you still facing the problem.
>>
>> Raghu.
>>
>>
>> Raghu Angadi wrote:
>>
>>> since you have HADOOP-4346, you should not have excessive epoll/pipe fds
>>> open. First of all do you still have the problem? If yes, how many hadoop
>>> streams do you have at a time?
>>>
>>> System.gc() won't help if you have HADOOP-4346.
>>>
>>> Ragu.
>>>
>>>  Thanks for your opinion!
>>>> 2009/6/22 Stas Oskin <st...@gmail.com>
>>>>
>>>>  Ok, seems this issue is already patched in the Hadoop distro I'm using
>>>>> (Cloudera).
>>>>>
>>>>> Any idea if I still should call GC manually/periodically to clean out
>>>>> all
>>>>> the stale pipes / epolls?
>>>>>
>>>>> 2009/6/22 Steve Loughran <st...@apache.org>
>>>>>
>>>>>  Stas Oskin wrote:
>>>>>>  Hi.
>>>>>>
>>>>>>> So what would be the recommended approach to pre-0.20.x series?
>>>>>>>
>>>>>>> To insure each file is used only by one thread, and then it safe to
>>>>>>> close
>>>>>>> the handle in that thread?
>>>>>>>
>>>>>>> Regards.
>>>>>>>
>>>>>>>  good question -I'm not sure. For anythiong you get with
>>>>>> FileSystem.get(),
>>>>>> its now dangerous to close, so try just setting the reference to null
>>>>>> and
>>>>>> hoping that GC will do the finalize() when needed
>>>>>>
>>>>>>
>

Re: "Too many open files" error, which gets resolved after some time

Posted by Stas Oskin <st...@gmail.com>.

Hi.

In my case it was actually ~ 12 fd's per stream, which included pipes and
epolls.

Could it be that HDFS opens 3 x 3 (input - output - epoll) fd's per each
thread, which make it close to the number I mentioned? Or it always 3 at
maximum per thread / stream?

Up to 10 sec looks quite the correct number, it seems it gets freed arround
this time indeed.

Regards.

2009/6/23 Raghu Angadi <ra...@yahoo-inc.com>

> To be more accurate, once you have HADOOP-4346,
>
> fds for epoll and pipes = 3 * threads blocked on Hadoop I/O
>
> Unless you have hundreds of threads at a time, you should not see hundreds
> of these. These fds stay up to 10sec even after the
> threads exit.
>
> I am a bit confused about your exact situation. Please check number of
> threads if you still facing the problem.
>
> Raghu.
>
>
> Raghu Angadi wrote:
>
>>
>> since you have HADOOP-4346, you should not have excessive epoll/pipe fds
>> open. First of all do you still have the problem? If yes, how many hadoop
>> streams do you have at a time?
>>
>> System.gc() won't help if you have HADOOP-4346.
>>
>> Ragu.
>>
>>  Thanks for your opinion!
>>>
>>> 2009/6/22 Stas Oskin <st...@gmail.com>
>>>
>>>  Ok, seems this issue is already patched in the Hadoop distro I'm using
>>>> (Cloudera).
>>>>
>>>> Any idea if I still should call GC manually/periodically to clean out
>>>> all
>>>> the stale pipes / epolls?
>>>>
>>>> 2009/6/22 Steve Loughran <st...@apache.org>
>>>>
>>>>  Stas Oskin wrote:
>>>>>
>>>>>  Hi.
>>>>>
>>>>>> So what would be the recommended approach to pre-0.20.x series?
>>>>>>
>>>>>> To insure each file is used only by one thread, and then it safe to
>>>>>> close
>>>>>> the handle in that thread?
>>>>>>
>>>>>> Regards.
>>>>>>
>>>>>>  good question -I'm not sure. For anythiong you get with
>>>>> FileSystem.get(),
>>>>> its now dangerous to close, so try just setting the reference to null
>>>>> and
>>>>> hoping that GC will do the finalize() when needed
>>>>>
>>>>>
>>>
>>
>

Re: "Too many open files" error, which gets resolved after some time

Posted by Raghu Angadi <ra...@yahoo-inc.com>.

To be more accurate, once you have HADOOP-4346,

fds for epoll and pipes = 3 * threads blocked on Hadoop I/O

Unless you have hundreds of threads at a time, you should not see 
hundreds of these. These fds stay up to 10sec even after the
threads exit.

I am a bit confused about your exact situation. Please check number of 
threads if you still facing the problem.

Raghu.

Raghu Angadi wrote:
> 
> since you have HADOOP-4346, you should not have excessive epoll/pipe fds 
> open. First of all do you still have the problem? If yes, how many 
> hadoop streams do you have at a time?
> 
> System.gc() won't help if you have HADOOP-4346.
> 
> Ragu.
> 
>> Thanks for your opinion!
>>
>> 2009/6/22 Stas Oskin <st...@gmail.com>
>>
>>> Ok, seems this issue is already patched in the Hadoop distro I'm using
>>> (Cloudera).
>>>
>>> Any idea if I still should call GC manually/periodically to clean out 
>>> all
>>> the stale pipes / epolls?
>>>
>>> 2009/6/22 Steve Loughran <st...@apache.org>
>>>
>>>> Stas Oskin wrote:
>>>>
>>>>  Hi.
>>>>> So what would be the recommended approach to pre-0.20.x series?
>>>>>
>>>>> To insure each file is used only by one thread, and then it safe to 
>>>>> close
>>>>> the handle in that thread?
>>>>>
>>>>> Regards.
>>>>>
>>>> good question -I'm not sure. For anythiong you get with 
>>>> FileSystem.get(),
>>>> its now dangerous to close, so try just setting the reference to 
>>>> null and
>>>> hoping that GC will do the finalize() when needed
>>>>
>>
>

Re: "Too many open files" error, which gets resolved after some time

Posted by Stas Oskin <st...@gmail.com>.

Hi.

In my testings, I typically opened between 20 and 40 concurrent streams.

Regards.

2009/6/23 Raghu Angadi <ra...@yahoo-inc.com>

> Stas Oskin wrote:
>
>> Hi.
>>
>> Any idea if calling System.gc() periodically will help reducing the amount
>> of pipes / epolls?
>>
>
> since you have HADOOP-4346, you should not have excessive epoll/pipe fds
> open. First of all do you still have the problem? If yes, how many hadoop
> streams do you have at a time?
>
> System.gc() won't help if you have HADOOP-4346.
>
> Ragu.
>
>
>  Thanks for your opinion!
>>
>> 2009/6/22 Stas Oskin <st...@gmail.com>
>>
>>  Ok, seems this issue is already patched in the Hadoop distro I'm using
>>> (Cloudera).
>>>
>>> Any idea if I still should call GC manually/periodically to clean out all
>>> the stale pipes / epolls?
>>>
>>> 2009/6/22 Steve Loughran <st...@apache.org>
>>>
>>>  Stas Oskin wrote:
>>>>
>>>>  Hi.
>>>>
>>>>> So what would be the recommended approach to pre-0.20.x series?
>>>>>
>>>>> To insure each file is used only by one thread, and then it safe to
>>>>> close
>>>>> the handle in that thread?
>>>>>
>>>>> Regards.
>>>>>
>>>>>  good question -I'm not sure. For anythiong you get with
>>>> FileSystem.get(),
>>>> its now dangerous to close, so try just setting the reference to null
>>>> and
>>>> hoping that GC will do the finalize() when needed
>>>>
>>>>
>>
>

Re: "Too many open files" error, which gets resolved after some time

Posted by Raghu Angadi <ra...@yahoo-inc.com>.

Stas Oskin wrote:
> Hi.
> 
> Any idea if calling System.gc() periodically will help reducing the amount
> of pipes / epolls?

since you have HADOOP-4346, you should not have excessive epoll/pipe fds 
open. First of all do you still have the problem? If yes, how many 
hadoop streams do you have at a time?

System.gc() won't help if you have HADOOP-4346.

Ragu.

> Thanks for your opinion!
> 
> 2009/6/22 Stas Oskin <st...@gmail.com>
> 
>> Ok, seems this issue is already patched in the Hadoop distro I'm using
>> (Cloudera).
>>
>> Any idea if I still should call GC manually/periodically to clean out all
>> the stale pipes / epolls?
>>
>> 2009/6/22 Steve Loughran <st...@apache.org>
>>
>>> Stas Oskin wrote:
>>>
>>>  Hi.
>>>> So what would be the recommended approach to pre-0.20.x series?
>>>>
>>>> To insure each file is used only by one thread, and then it safe to close
>>>> the handle in that thread?
>>>>
>>>> Regards.
>>>>
>>> good question -I'm not sure. For anythiong you get with FileSystem.get(),
>>> its now dangerous to close, so try just setting the reference to null and
>>> hoping that GC will do the finalize() when needed
>>>
>

Re: "Too many open files" error, which gets resolved after some time

Posted by Stas Oskin <st...@gmail.com>.

Hi.

Thank for the advice, just to clarify:
The upgrade of you speak of of cleaning the pipes/epolls more often, is
regarding the issue discussed (HADOOP-4346, fixed in my distribution), or
it's some other issue?

If yes, does it has a ticket I can see, or it should be filled to Jira?

Thanks!

2009/6/23 Brian Bockelman <bb...@cse.unl.edu>

> Hey Stas,
>
> It sounds like it's technically possible, but it also sounds like a
> horrible hack: I'd avoid this at all expense.  This is how cruft is born.
>
> The pipes/epolls are something that eventually get cleaned up - but they
> don't get cleaned up often enough for your cluster.  I would recommend just
> increasing the limit on the node itself and then wait for an upgrade to
> "solve" this.
>
> Brian
>
>
> On Jun 23, 2009, at 3:31 AM, Stas Oskin wrote:
>
>  Hi.
>>
>> Any idea if calling System.gc() periodically will help reducing the amount
>> of pipes / epolls?
>>
>> Thanks for your opinion!
>>
>> 2009/6/22 Stas Oskin <st...@gmail.com>
>>
>>  Ok, seems this issue is already patched in the Hadoop distro I'm using
>>> (Cloudera).
>>>
>>> Any idea if I still should call GC manually/periodically to clean out all
>>> the stale pipes / epolls?
>>>
>>> 2009/6/22 Steve Loughran <st...@apache.org>
>>>
>>>  Stas Oskin wrote:
>>>>
>>>> Hi.
>>>>
>>>>>
>>>>> So what would be the recommended approach to pre-0.20.x series?
>>>>>
>>>>> To insure each file is used only by one thread, and then it safe to
>>>>> close
>>>>> the handle in that thread?
>>>>>
>>>>> Regards.
>>>>>
>>>>>
>>>> good question -I'm not sure. For anythiong you get with
>>>> FileSystem.get(),
>>>> its now dangerous to close, so try just setting the reference to null
>>>> and
>>>> hoping that GC will do the finalize() when needed
>>>>
>>>>
>>>
>

Re: "Too many open files" error, which gets resolved after some time

Posted by Brian Bockelman <bb...@cse.unl.edu>.

Hey Stas,

It sounds like it's technically possible, but it also sounds like a  
horrible hack: I'd avoid this at all expense.  This is how cruft is  
born.

The pipes/epolls are something that eventually get cleaned up - but  
they don't get cleaned up often enough for your cluster.  I would  
recommend just increasing the limit on the node itself and then wait  
for an upgrade to "solve" this.

Brian

On Jun 23, 2009, at 3:31 AM, Stas Oskin wrote:

> Hi.
>
> Any idea if calling System.gc() periodically will help reducing the  
> amount
> of pipes / epolls?
>
> Thanks for your opinion!
>
> 2009/6/22 Stas Oskin <st...@gmail.com>
>
>> Ok, seems this issue is already patched in the Hadoop distro I'm  
>> using
>> (Cloudera).
>>
>> Any idea if I still should call GC manually/periodically to clean  
>> out all
>> the stale pipes / epolls?
>>
>> 2009/6/22 Steve Loughran <st...@apache.org>
>>
>>> Stas Oskin wrote:
>>>
>>> Hi.
>>>>
>>>> So what would be the recommended approach to pre-0.20.x series?
>>>>
>>>> To insure each file is used only by one thread, and then it safe  
>>>> to close
>>>> the handle in that thread?
>>>>
>>>> Regards.
>>>>
>>>
>>> good question -I'm not sure. For anythiong you get with  
>>> FileSystem.get(),
>>> its now dangerous to close, so try just setting the reference to  
>>> null and
>>> hoping that GC will do the finalize() when needed
>>>
>>

Re: "Too many open files" error, which gets resolved after some time

Posted by Stas Oskin <st...@gmail.com>.

Hi.

Any idea if calling System.gc() periodically will help reducing the amount
of pipes / epolls?

Thanks for your opinion!

2009/6/22 Stas Oskin <st...@gmail.com>

> Ok, seems this issue is already patched in the Hadoop distro I'm using
> (Cloudera).
>
> Any idea if I still should call GC manually/periodically to clean out all
> the stale pipes / epolls?
>
> 2009/6/22 Steve Loughran <st...@apache.org>
>
>> Stas Oskin wrote:
>>
>>  Hi.
>>>
>>> So what would be the recommended approach to pre-0.20.x series?
>>>
>>> To insure each file is used only by one thread, and then it safe to close
>>> the handle in that thread?
>>>
>>> Regards.
>>>
>>
>> good question -I'm not sure. For anythiong you get with FileSystem.get(),
>> its now dangerous to close, so try just setting the reference to null and
>> hoping that GC will do the finalize() when needed
>>
>

Re: "Too many open files" error, which gets resolved after some time

Posted by Stas Oskin <st...@gmail.com>.

Ok, seems this issue is already patched in the Hadoop distro I'm using
(Cloudera).

Any idea if I still should call GC manually/periodically to clean out all
the stale pipes / epolls?

2009/6/22 Steve Loughran <st...@apache.org>

> Stas Oskin wrote:
>
>> Hi.
>>
>> So what would be the recommended approach to pre-0.20.x series?
>>
>> To insure each file is used only by one thread, and then it safe to close
>> the handle in that thread?
>>
>> Regards.
>>
>
> good question -I'm not sure. For anythiong you get with FileSystem.get(),
> its now dangerous to close, so try just setting the reference to null and
> hoping that GC will do the finalize() when needed
>

Re: "Too many open files" error, which gets resolved after some time

Posted by Steve Loughran <st...@apache.org>.

Stas Oskin wrote:
> Hi.
> 
> So what would be the recommended approach to pre-0.20.x series?
> 
> To insure each file is used only by one thread, and then it safe to close
> the handle in that thread?
> 
> Regards.

good question -I'm not sure. For anythiong you get with 
FileSystem.get(), its now dangerous to close, so try just setting the 
reference to null and hoping that GC will do the finalize() when needed

Re: "Too many open files" error, which gets resolved after some time

Posted by Stas Oskin <st...@gmail.com>.

Hi.

So what would be the recommended approach to pre-0.20.x series?

To insure each file is used only by one thread, and then it safe to close
the handle in that thread?

Regards.

2009/6/22 Steve Loughran <st...@apache.org>

> Raghu Angadi wrote:
>
>>
>> Is this before 0.20.0? Assuming you have closed these streams, it is
>> mostly https://issues.apache.org/jira/browse/HADOOP-4346
>>
>> It is the JDK internal implementation that depends on GC to free up its
>> cache of selectors. HADOOP-4346 avoids this by using hadoop's own cache.
>>
>
> yes, and it's that change that led to my stack traces :(
>
> http://jira.smartfrog.org/jira/browse/SFOS-1208
>

Re: "Too many open files" error, which gets resolved after some time

Posted by Steve Loughran <st...@apache.org>.

Raghu Angadi wrote:
> 
> Is this before 0.20.0? Assuming you have closed these streams, it is 
> mostly https://issues.apache.org/jira/browse/HADOOP-4346
> 
> It is the JDK internal implementation that depends on GC to free up its 
> cache of selectors. HADOOP-4346 avoids this by using hadoop's own cache.

yes, and it's that change that led to my stack traces :(

http://jira.smartfrog.org/jira/browse/SFOS-1208

Re: "Too many open files" error, which gets resolved after some time

Posted by Stas Oskin <st...@gmail.com>.

Hi Rahid.

A question - this issue does not influence Hadoop itself (DataNodes,
etc...), but rather influence any application using DFS, correct?

If so, without patching iI should either to increase fd limit (which might
fill-up as well???), or periodically launch the GC?

Regards.


2009/6/22 Raghu Angadi <ra...@yahoo-inc.com>

>
> 64k might help in the sense, you might hit GC before you hit the limit.
>
> Otherwise, your only options are to use the patch attached to HADOOP-4346
> or run System.gc() occasionally.
>
> I think it should be committed to 0.18.4
>
>
> Raghu.
>
> Stas Oskin wrote:
>
>> Hi.
>>
>> Yes, it happens with 0.18.3.
>>
>> I'm closing now every FSData stream I receive from HDFS, so the number of
>> open fd's in DataNode is reduced.
>>
>> Problem is that my own DFS client still have a high number of fd's open,
>> mostly pipes and epolls.
>> They sometimes quickly drop to the level of ~400 - 500, and sometimes just
>> stuck at ~1000.
>>
>> I'm still trying to find out how well it behaves if I set the maximum fd
>> number to 65K.
>>
>> Regards.
>>
>>
>>
>> 2009/6/22 Raghu Angadi <ra...@yahoo-inc.com>
>>
>>  Is this before 0.20.0? Assuming you have closed these streams, it is
>>> mostly
>>> https://issues.apache.org/jira/browse/HADOOP-4346
>>>
>>> It is the JDK internal implementation that depends on GC to free up its
>>> cache of selectors. HADOOP-4346 avoids this by using hadoop's own cache.
>>>
>>> Raghu.
>>>
>>>
>>> Stas Oskin wrote:
>>>
>>>  Hi.
>>>>
>>>> After tracing some more with the lsof utility, and I managed to stop the
>>>> growth on the DataNode process, but still have issues with my DFS
>>>> client.
>>>>
>>>> It seems that my DFS client opens hundreds of pipes and eventpolls. Here
>>>> is
>>>> a small part of the lsof output:
>>>>
>>>> java    10508 root  387w  FIFO                0,6           6142565 pipe
>>>> java    10508 root  388r  FIFO                0,6           6142565 pipe
>>>> java    10508 root  389u  0000               0,10        0  6142566
>>>> eventpoll
>>>> java    10508 root  390u  FIFO                0,6           6135311 pipe
>>>> java    10508 root  391r  FIFO                0,6           6135311 pipe
>>>> java    10508 root  392u  0000               0,10        0  6135312
>>>> eventpoll
>>>> java    10508 root  393r  FIFO                0,6           6148234 pipe
>>>> java    10508 root  394w  FIFO                0,6           6142570 pipe
>>>> java    10508 root  395r  FIFO                0,6           6135857 pipe
>>>> java    10508 root  396r  FIFO                0,6           6142570 pipe
>>>> java    10508 root  397r  0000               0,10        0  6142571
>>>> eventpoll
>>>> java    10508 root  398u  FIFO                0,6           6135319 pipe
>>>> java    10508 root  399w  FIFO                0,6           6135319 pipe
>>>>
>>>> I'm using FSDataInputStream and FSDataOutputStream, so this might be
>>>> related
>>>> to pipes?
>>>>
>>>> So, my questions are:
>>>>
>>>> 1) What happens these pipes/epolls to appear?
>>>>
>>>> 2) More important, how I can prevent their accumation and growth?
>>>>
>>>> Thanks in advance!
>>>>
>>>> 2009/6/21 Stas Oskin <st...@gmail.com>
>>>>
>>>>  Hi.
>>>>
>>>>> I have HDFS client and HDFS datanode running on same machine.
>>>>>
>>>>> When I'm trying to access a dozen of files at once from the client,
>>>>> several
>>>>> times in a row, I'm starting to receive the following errors on client,
>>>>> and
>>>>> HDFS browse function.
>>>>>
>>>>> HDFS Client: "Could not get block locations. Aborting..."
>>>>> HDFS browse: "Too many open files"
>>>>>
>>>>> I can increase the maximum number of files that can opened, as I have
>>>>> it
>>>>> set to the default 1024, but would like to first solve the problem, as
>>>>> larger value just means it would run out of files again later on.
>>>>>
>>>>> So my questions are:
>>>>>
>>>>> 1) Does the HDFS datanode keeps any files opened, even after the HDFS
>>>>> client have already closed them?
>>>>>
>>>>> 2) Is it possible to find out, who keeps the opened files - datanode or
>>>>> client (so I could pin-point the source of the problem).
>>>>>
>>>>> Thanks in advance!
>>>>>
>>>>>
>>>>>
>>
>

Re: "Too many open files" error, which gets resolved after some time

Posted by Raghu Angadi <ra...@yahoo-inc.com>.

64k might help in the sense, you might hit GC before you hit the limit.

Otherwise, your only options are to use the patch attached to 
HADOOP-4346 or run System.gc() occasionally.

I think it should be committed to 0.18.4

Raghu.

Stas Oskin wrote:
> Hi.
> 
> Yes, it happens with 0.18.3.
> 
> I'm closing now every FSData stream I receive from HDFS, so the number of
> open fd's in DataNode is reduced.
> 
> Problem is that my own DFS client still have a high number of fd's open,
> mostly pipes and epolls.
> They sometimes quickly drop to the level of ~400 - 500, and sometimes just
> stuck at ~1000.
> 
> I'm still trying to find out how well it behaves if I set the maximum fd
> number to 65K.
> 
> Regards.
> 
> 
> 
> 2009/6/22 Raghu Angadi <ra...@yahoo-inc.com>
> 
>> Is this before 0.20.0? Assuming you have closed these streams, it is mostly
>> https://issues.apache.org/jira/browse/HADOOP-4346
>>
>> It is the JDK internal implementation that depends on GC to free up its
>> cache of selectors. HADOOP-4346 avoids this by using hadoop's own cache.
>>
>> Raghu.
>>
>>
>> Stas Oskin wrote:
>>
>>> Hi.
>>>
>>> After tracing some more with the lsof utility, and I managed to stop the
>>> growth on the DataNode process, but still have issues with my DFS client.
>>>
>>> It seems that my DFS client opens hundreds of pipes and eventpolls. Here
>>> is
>>> a small part of the lsof output:
>>>
>>> java    10508 root  387w  FIFO                0,6           6142565 pipe
>>> java    10508 root  388r  FIFO                0,6           6142565 pipe
>>> java    10508 root  389u  0000               0,10        0  6142566
>>> eventpoll
>>> java    10508 root  390u  FIFO                0,6           6135311 pipe
>>> java    10508 root  391r  FIFO                0,6           6135311 pipe
>>> java    10508 root  392u  0000               0,10        0  6135312
>>> eventpoll
>>> java    10508 root  393r  FIFO                0,6           6148234 pipe
>>> java    10508 root  394w  FIFO                0,6           6142570 pipe
>>> java    10508 root  395r  FIFO                0,6           6135857 pipe
>>> java    10508 root  396r  FIFO                0,6           6142570 pipe
>>> java    10508 root  397r  0000               0,10        0  6142571
>>> eventpoll
>>> java    10508 root  398u  FIFO                0,6           6135319 pipe
>>> java    10508 root  399w  FIFO                0,6           6135319 pipe
>>>
>>> I'm using FSDataInputStream and FSDataOutputStream, so this might be
>>> related
>>> to pipes?
>>>
>>> So, my questions are:
>>>
>>> 1) What happens these pipes/epolls to appear?
>>>
>>> 2) More important, how I can prevent their accumation and growth?
>>>
>>> Thanks in advance!
>>>
>>> 2009/6/21 Stas Oskin <st...@gmail.com>
>>>
>>>  Hi.
>>>> I have HDFS client and HDFS datanode running on same machine.
>>>>
>>>> When I'm trying to access a dozen of files at once from the client,
>>>> several
>>>> times in a row, I'm starting to receive the following errors on client,
>>>> and
>>>> HDFS browse function.
>>>>
>>>> HDFS Client: "Could not get block locations. Aborting..."
>>>> HDFS browse: "Too many open files"
>>>>
>>>> I can increase the maximum number of files that can opened, as I have it
>>>> set to the default 1024, but would like to first solve the problem, as
>>>> larger value just means it would run out of files again later on.
>>>>
>>>> So my questions are:
>>>>
>>>> 1) Does the HDFS datanode keeps any files opened, even after the HDFS
>>>> client have already closed them?
>>>>
>>>> 2) Is it possible to find out, who keeps the opened files - datanode or
>>>> client (so I could pin-point the source of the problem).
>>>>
>>>> Thanks in advance!
>>>>
>>>>
>

Re: "Too many open files" error, which gets resolved after some time

Posted by Stas Oskin <st...@gmail.com>.

Hi.

Yes, it happens with 0.18.3.

I'm closing now every FSData stream I receive from HDFS, so the number of
open fd's in DataNode is reduced.

Problem is that my own DFS client still have a high number of fd's open,
mostly pipes and epolls.
They sometimes quickly drop to the level of ~400 - 500, and sometimes just
stuck at ~1000.

I'm still trying to find out how well it behaves if I set the maximum fd
number to 65K.

Regards.



2009/6/22 Raghu Angadi <ra...@yahoo-inc.com>

>
> Is this before 0.20.0? Assuming you have closed these streams, it is mostly
> https://issues.apache.org/jira/browse/HADOOP-4346
>
> It is the JDK internal implementation that depends on GC to free up its
> cache of selectors. HADOOP-4346 avoids this by using hadoop's own cache.
>
> Raghu.
>
>
> Stas Oskin wrote:
>
>> Hi.
>>
>> After tracing some more with the lsof utility, and I managed to stop the
>> growth on the DataNode process, but still have issues with my DFS client.
>>
>> It seems that my DFS client opens hundreds of pipes and eventpolls. Here
>> is
>> a small part of the lsof output:
>>
>> java    10508 root  387w  FIFO                0,6           6142565 pipe
>> java    10508 root  388r  FIFO                0,6           6142565 pipe
>> java    10508 root  389u  0000               0,10        0  6142566
>> eventpoll
>> java    10508 root  390u  FIFO                0,6           6135311 pipe
>> java    10508 root  391r  FIFO                0,6           6135311 pipe
>> java    10508 root  392u  0000               0,10        0  6135312
>> eventpoll
>> java    10508 root  393r  FIFO                0,6           6148234 pipe
>> java    10508 root  394w  FIFO                0,6           6142570 pipe
>> java    10508 root  395r  FIFO                0,6           6135857 pipe
>> java    10508 root  396r  FIFO                0,6           6142570 pipe
>> java    10508 root  397r  0000               0,10        0  6142571
>> eventpoll
>> java    10508 root  398u  FIFO                0,6           6135319 pipe
>> java    10508 root  399w  FIFO                0,6           6135319 pipe
>>
>> I'm using FSDataInputStream and FSDataOutputStream, so this might be
>> related
>> to pipes?
>>
>> So, my questions are:
>>
>> 1) What happens these pipes/epolls to appear?
>>
>> 2) More important, how I can prevent their accumation and growth?
>>
>> Thanks in advance!
>>
>> 2009/6/21 Stas Oskin <st...@gmail.com>
>>
>>  Hi.
>>>
>>> I have HDFS client and HDFS datanode running on same machine.
>>>
>>> When I'm trying to access a dozen of files at once from the client,
>>> several
>>> times in a row, I'm starting to receive the following errors on client,
>>> and
>>> HDFS browse function.
>>>
>>> HDFS Client: "Could not get block locations. Aborting..."
>>> HDFS browse: "Too many open files"
>>>
>>> I can increase the maximum number of files that can opened, as I have it
>>> set to the default 1024, but would like to first solve the problem, as
>>> larger value just means it would run out of files again later on.
>>>
>>> So my questions are:
>>>
>>> 1) Does the HDFS datanode keeps any files opened, even after the HDFS
>>> client have already closed them?
>>>
>>> 2) Is it possible to find out, who keeps the opened files - datanode or
>>> client (so I could pin-point the source of the problem).
>>>
>>> Thanks in advance!
>>>
>>>
>>
>

Re: "Too many open files" error, which gets resolved after some time

Posted by Raghu Angadi <ra...@yahoo-inc.com>.

Is this before 0.20.0? Assuming you have closed these streams, it is 
mostly https://issues.apache.org/jira/browse/HADOOP-4346

It is the JDK internal implementation that depends on GC to free up its 
cache of selectors. HADOOP-4346 avoids this by using hadoop's own cache.

Raghu.

Stas Oskin wrote:
> Hi.
> 
> After tracing some more with the lsof utility, and I managed to stop the
> growth on the DataNode process, but still have issues with my DFS client.
> 
> It seems that my DFS client opens hundreds of pipes and eventpolls. Here is
> a small part of the lsof output:
> 
> java    10508 root  387w  FIFO                0,6           6142565 pipe
> java    10508 root  388r  FIFO                0,6           6142565 pipe
> java    10508 root  389u  0000               0,10        0  6142566
> eventpoll
> java    10508 root  390u  FIFO                0,6           6135311 pipe
> java    10508 root  391r  FIFO                0,6           6135311 pipe
> java    10508 root  392u  0000               0,10        0  6135312
> eventpoll
> java    10508 root  393r  FIFO                0,6           6148234 pipe
> java    10508 root  394w  FIFO                0,6           6142570 pipe
> java    10508 root  395r  FIFO                0,6           6135857 pipe
> java    10508 root  396r  FIFO                0,6           6142570 pipe
> java    10508 root  397r  0000               0,10        0  6142571
> eventpoll
> java    10508 root  398u  FIFO                0,6           6135319 pipe
> java    10508 root  399w  FIFO                0,6           6135319 pipe
> 
> I'm using FSDataInputStream and FSDataOutputStream, so this might be related
> to pipes?
> 
> So, my questions are:
> 
> 1) What happens these pipes/epolls to appear?
> 
> 2) More important, how I can prevent their accumation and growth?
> 
> Thanks in advance!
> 
> 2009/6/21 Stas Oskin <st...@gmail.com>
> 
>> Hi.
>>
>> I have HDFS client and HDFS datanode running on same machine.
>>
>> When I'm trying to access a dozen of files at once from the client, several
>> times in a row, I'm starting to receive the following errors on client, and
>> HDFS browse function.
>>
>> HDFS Client: "Could not get block locations. Aborting..."
>> HDFS browse: "Too many open files"
>>
>> I can increase the maximum number of files that can opened, as I have it
>> set to the default 1024, but would like to first solve the problem, as
>> larger value just means it would run out of files again later on.
>>
>> So my questions are:
>>
>> 1) Does the HDFS datanode keeps any files opened, even after the HDFS
>> client have already closed them?
>>
>> 2) Is it possible to find out, who keeps the opened files - datanode or
>> client (so I could pin-point the source of the problem).
>>
>> Thanks in advance!
>>
>

Re: "Too many open files" error, which gets resolved after some time

Posted by Stas Oskin <st...@gmail.com>.

Hi.

After tracing some more with the lsof utility, and I managed to stop the
growth on the DataNode process, but still have issues with my DFS client.

It seems that my DFS client opens hundreds of pipes and eventpolls. Here is
a small part of the lsof output:

java    10508 root  387w  FIFO                0,6           6142565 pipe
java    10508 root  388r  FIFO                0,6           6142565 pipe
java    10508 root  389u  0000               0,10        0  6142566
eventpoll
java    10508 root  390u  FIFO                0,6           6135311 pipe
java    10508 root  391r  FIFO                0,6           6135311 pipe
java    10508 root  392u  0000               0,10        0  6135312
eventpoll
java    10508 root  393r  FIFO                0,6           6148234 pipe
java    10508 root  394w  FIFO                0,6           6142570 pipe
java    10508 root  395r  FIFO                0,6           6135857 pipe
java    10508 root  396r  FIFO                0,6           6142570 pipe
java    10508 root  397r  0000               0,10        0  6142571
eventpoll
java    10508 root  398u  FIFO                0,6           6135319 pipe
java    10508 root  399w  FIFO                0,6           6135319 pipe

I'm using FSDataInputStream and FSDataOutputStream, so this might be related
to pipes?

So, my questions are:

1) What happens these pipes/epolls to appear?

2) More important, how I can prevent their accumation and growth?

Thanks in advance!

2009/6/21 Stas Oskin <st...@gmail.com>

> Hi.
>
> I have HDFS client and HDFS datanode running on same machine.
>
> When I'm trying to access a dozen of files at once from the client, several
> times in a row, I'm starting to receive the following errors on client, and
> HDFS browse function.
>
> HDFS Client: "Could not get block locations. Aborting..."
> HDFS browse: "Too many open files"
>
> I can increase the maximum number of files that can opened, as I have it
> set to the default 1024, but would like to first solve the problem, as
> larger value just means it would run out of files again later on.
>
> So my questions are:
>
> 1) Does the HDFS datanode keeps any files opened, even after the HDFS
> client have already closed them?
>
> 2) Is it possible to find out, who keeps the opened files - datanode or
> client (so I could pin-point the source of the problem).
>
> Thanks in advance!
>