You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cassandra.apache.org by Thomas Downing <td...@proteus-technologies.com> on 2010/07/14 12:09:35 UTC

Too many open files [was Re: Minimizing the impact of compaction on latency and throughput]

On 7/13/2010 9:20 AM, Jonathan Ellis wrote:
> On Tue, Jul 13, 2010 at 4:19 AM, Thomas Downing
> <td...@proteus-technologies.com>  wrote:
>    
>> On a related note:  I am running some feasibility tests looking for
>> high ingest rate capabilities.  While testing Cassandra the problem
>> I've encountered is that it runs out of file handles during compaction.
>>      
> This usually just means "increase the allowed fh via ulimit/etc."
>
> Increasing the memtable thresholds so that you create less sstables,
> but larger ones, is also a good idea.  The defaults are small so
> Cassandra can work on a 1GB heap which is much smaller than most
> production ones.  Reasonable rule of thumb: if you have a heap of N
> GB, increase both the throughput and count thresholds by N times.
>
>    
Thanks for the suggestion.  I gave it a whirl, but no go.  The file 
handles in
in use stayed at around 500 for the first 30M or so mutates, then within
4 seconds they jumped to about 800, stayed there for about 30 seconds,
then within 5 seconds went over 2022, at which point the server entered
the cycle of "SocketException: Too many open files.  Interesting thing is
that the file limit for this process is 32768.  Note the numbers below 
as well.

If there is anything specific you would like me to try, let me know.

Seems like there's some sort of non-linear behavior here.  This behavior is
the same as before I multiplied the Cassandra params by 4 (number of G);
which leads me to think that increasing limits, whether files or Cassandra
parameters is likely to be a tail-chasing excercise.

This causes time-out exceptions at the client.  On this exception, my client
closes the connection, waits a bit, then retries.  After a few hours of this
the server still had not recovered.

I killed the clients, and watched the server after that.  The file 
handles open
dropped by 8, and have stayed there.  The server is, of course, not throwing
SocketException any more.  On the other hand, the server is not doing any
thing at all.

When there is no client activity, and the server is idle, there are 155 
threads
running in the JVM.  The all are in one of three states, almost all 
blocked at
futex( ),  a few blocked at accept( ) , a few cycling over timeout on 
futex(),
gettimeofday(), futex() ... None are blocked at IO.  I can't attach a 
debugger,
I get IO exceptions trying either socket or local connections, no surprise,
so I don't know of a way to get the Java code where the threads are 
blocking.

More than one fd can be open on a given file, and many of open fd's are
on files that have been deleted.  The stale fd's are all on Data.db 
files in the
data directory, which I have separate from the commit log directory.

I haven't had a chance to look at the code handling files, and I am not any
sort of Java expert, but could this be due to Java's lazy resource clean up?
I wonder if when considering writing your own file handling classes for
O_DIRECT or posix_fadvise or whatever, an explicit close(2) might help.

A restart of the client causes immediate SocketExceptions at the server and
timeouts at the client.

I noted on the restart that the open fd's jumped by 32, despite only making
4 connections.  At this point, there were 2028 open files - more than there
where when the exceptions began at 2002 open files.  So it seems like the
exception is not caused by the OS returning EMFILE - unless it was returning
EMFILE for some strange reason, and the bump in open files is due to an
increase in duplicate open files.  (BTW, it's not ENFILE!).

I also noted that although the TimeoutExceptions did not  occur immediately
on the client, the SocketExceptions began immediately on the server.  This
does not seem to match up.  I am using the org.apache.cassandra.thrift API
directly, not any higher level wrapper.

Finally, this jump to 2028 on the restart caused a new symptom.  I only had
the client running a few seconds, but after 15 minutes, the server is 
still throwing
exceptions, even though the open file handles immediately dropped from
2028 down to 1967.

Thanks for your attention, and all your work,

Thomas Downing

Re: Too many open files [was Re: Minimizing the impact of compaction on latency and throughput]

Posted by Jorge Barrios <jo...@tapulous.com>.

Each of my top-level functions was allocating a Hector client connection at
the top, and releasing it when returning. The problem arose when a top-level
function had to call another top-level function, which led to the same
thread allocating two connections. Hector was not releasing one of them even
though I was explicitly requesting them to be released. This might have been
fixed since then, and like I said, I didn't dig into why it was happening. I
just made sure to pass along the connection instances as necessary and the
problem went away.

On Wed, Jul 14, 2010 at 11:40 AM, shimi <sh...@gmail.com> wrote:

> do you mean that you don't release the connection back to fhe pool?
>
> On 2010 7 14 20:51, "Jorge Barrios" <jo...@tapulous.com> wrote:
>
> Thomas, I had a similar problem a few weeks back. I changed my code to make
> sure that each thread only creates and uses one Hector connection. It seems
> that client sockets are not being released properly, but I didn't have the
> time to dig into it.
>
> Jorge
>
>
>
> On Wed, Jul 14, 2010 at 8:28 AM, Peter Schuller <
> peter.schuller@infidyne.com> wrote:
> >
> > > [snip]
> ...
>
>

Re: Too many open files [was Re: Minimizing the impact of compaction on latency and throughput]

Posted by shimi <sh...@gmail.com>.

do you mean that you don't release the connection back to fhe pool?

On 2010 7 14 20:51, "Jorge Barrios" <jo...@tapulous.com> wrote:

Thomas, I had a similar problem a few weeks back. I changed my code to make
sure that each thread only creates and uses one Hector connection. It seems
that client sockets are not being released properly, but I didn't have the
time to dig into it.

Jorge

On Wed, Jul 14, 2010 at 8:28 AM, Peter Schuller <pe...@infidyne.com>
wrote:
>
> > [snip]
...

Re: Too many open files [was Re: Minimizing the impact of compaction on latency and throughput]

Posted by Jorge Barrios <jo...@tapulous.com>.

Thomas, I had a similar problem a few weeks back. I changed my code to make
sure that each thread only creates and uses one Hector connection. It seems
that client sockets are not being released properly, but I didn't have the
time to dig into it.

Jorge

On Wed, Jul 14, 2010 at 8:28 AM, Peter Schuller <peter.schuller@infidyne.com
> wrote:

> > [snip]
> > I'm not sure that is the case.
> >
> > When the server gets into the unrecoverable state, the repeating
> exceptions
> > are indeed "SocketException: Too many open files".
> [snip]
> > Although this is unquestionably a network error,  I don't think it is
> > actually a
> > network problem per se, as the maximum number of sockets open by the
> > Cassandra server is at this point is about 8.  When I kill the client,
> > sockets
> > held are just the listening sockets - no sockets in ESTABLISHED or
> > TIMED_WAIT.
>
> Is this based on netstat or lsof or similar? When the node is in the
> state of giving these errors, try inspecting /proc/<pid>/fd or use
> lsof. Presumably you'll see thousands of fds of some category; either
> sockets or files.
>
> (If you already did this, sorry!)
>
> --
> / Peter Schuller
>

Re: Too many open files [was Re: Minimizing the impact of compaction on latency and throughput]

Posted by Peter Schuller <pe...@infidyne.com>.

> [snip]
> I'm not sure that is the case.
>
> When the server gets into the unrecoverable state, the repeating exceptions
> are indeed "SocketException: Too many open files".
[snip]
> Although this is unquestionably a network error,  I don't think it is
> actually a
> network problem per se, as the maximum number of sockets open by the
> Cassandra server is at this point is about 8.  When I kill the client,
> sockets
> held are just the listening sockets - no sockets in ESTABLISHED or
> TIMED_WAIT.

Is this based on netstat or lsof or similar? When the node is in the
state of giving these errors, try inspecting /proc/<pid>/fd or use
lsof. Presumably you'll see thousands of fds of some category; either
sockets or files.

(If you already did this, sorry!)

-- 
/ Peter Schuller

Re: Too many open files [was Re: Minimizing the impact of compaction on latency and throughput]

Posted by Thomas Downing <td...@proteus-technologies.com>.

On 7/14/2010 11:07 AM, Jonathan Ellis wrote:
> socketexception means this is coming from the network, not the sstables
>
> knowing the full error message would be nice, but just about any
> problem on that end should be fixed by adding connection pooling to
> your client.
>
> (moving to user@)
>
> On Wed, Jul 14, 2010 at 5:09 AM, Thomas Downing
> <td...@proteus-technologies.com>  wrote:
>    
>> On 7/13/2010 9:20 AM, Jonathan Ellis wrote:
>>      
>>> On Tue, Jul 13, 2010 at 4:19 AM, Thomas Downing
>>> <td...@proteus-technologies.com>    wrote:
>>>
>>>        
>>>> On a related note:  I am running some feasibility tests looking for
>>>> high ingest rate capabilities.  While testing Cassandra the problem
>>>> I've encountered is that it runs out of file handles during compaction.
>>>>
>>>>          
>>>
[snip]
I'm not sure that is the case.

When the server gets into the unrecoverable state, the repeating exceptions
are indeed "SocketException: Too many open files".

WARN [main] 2010-07-14 06:08:46,772 TThreadPoolServer.java (line 190) 
Transport error occurred during acceptance of message.
org.apache.thrift.transport.TTransportException: 
java.net.SocketException: Too many open files
     at 
org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:124)
     at 
org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:35)
     at 
org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31)
     at 
org.apache.thrift.server.TThreadPoolServer.serve(TThreadPoolServer.java:184)
     at 
org.apache.cassandra.thrift.CassandraDaemon.start(CassandraDaemon.java:149)
     at 
org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:190)
Caused by: java.net.SocketException: Too many open files
     at java.net.PlainSocketImpl.socketAccept(Native Method)
     at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:390)
     at java.net.ServerSocket.implAccept(ServerSocket.java:453)
     at java.net.ServerSocket.accept(ServerSocket.java:421)
     at 
org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:119)
     ... 5 more

Although this is unquestionably a network error,  I don't think it is 
actually a
network problem per se, as the maximum number of sockets open by the
Cassandra server is at this point is about 8.  When I kill the client, 
sockets
held are just the listening sockets - no sockets in ESTABLISHED or
TIMED_WAIT.

I was originally using the client interface provided by Hector, but went 
to the
direct thrift API to eliminate moving parts in the puzzle.  When using 
Hector,
I was using the ClientConnectionPool. Either way, the behavior is the same.

Just a further note:  my client test jig acquires a single connection, 
then uses
that connection for successive batch_mutate operations, with out closing.
It only closes the connection on an exception, or at the end of the run.  If
it would be helpful, I can change that to open/mutate/close and repeat to
see what happens.

Thanks
Thomas Downing

Re: Too many open files [was Re: Minimizing the impact of compaction on latency and throughput]

Posted by Jonathan Ellis <jb...@gmail.com>.

socketexception means this is coming from the network, not the sstables

knowing the full error message would be nice, but just about any
problem on that end should be fixed by adding connection pooling to
your client.

(moving to user@)

On Wed, Jul 14, 2010 at 5:09 AM, Thomas Downing
<td...@proteus-technologies.com> wrote:
> On 7/13/2010 9:20 AM, Jonathan Ellis wrote:
>>
>> On Tue, Jul 13, 2010 at 4:19 AM, Thomas Downing
>> <td...@proteus-technologies.com>  wrote:
>>
>>>
>>> On a related note:  I am running some feasibility tests looking for
>>> high ingest rate capabilities.  While testing Cassandra the problem
>>> I've encountered is that it runs out of file handles during compaction.
>>>
>>
>> This usually just means "increase the allowed fh via ulimit/etc."
>>
>> Increasing the memtable thresholds so that you create less sstables,
>> but larger ones, is also a good idea.  The defaults are small so
>> Cassandra can work on a 1GB heap which is much smaller than most
>> production ones.  Reasonable rule of thumb: if you have a heap of N
>> GB, increase both the throughput and count thresholds by N times.
>>
>>
>
> Thanks for the suggestion.  I gave it a whirl, but no go.  The file handles
> in
> in use stayed at around 500 for the first 30M or so mutates, then within
> 4 seconds they jumped to about 800, stayed there for about 30 seconds,
> then within 5 seconds went over 2022, at which point the server entered
> the cycle of "SocketException: Too many open files.  Interesting thing is
> that the file limit for this process is 32768.  Note the numbers below as
> well.
>
> If there is anything specific you would like me to try, let me know.
>
> Seems like there's some sort of non-linear behavior here.  This behavior is
> the same as before I multiplied the Cassandra params by 4 (number of G);
> which leads me to think that increasing limits, whether files or Cassandra
> parameters is likely to be a tail-chasing excercise.
>
> This causes time-out exceptions at the client.  On this exception, my client
> closes the connection, waits a bit, then retries.  After a few hours of this
> the server still had not recovered.
>
> I killed the clients, and watched the server after that.  The file handles
> open
> dropped by 8, and have stayed there.  The server is, of course, not throwing
> SocketException any more.  On the other hand, the server is not doing any
> thing at all.
>
> When there is no client activity, and the server is idle, there are 155
> threads
> running in the JVM.  The all are in one of three states, almost all blocked
> at
> futex( ),  a few blocked at accept( ) , a few cycling over timeout on
> futex(),
> gettimeofday(), futex() ... None are blocked at IO.  I can't attach a
> debugger,
> I get IO exceptions trying either socket or local connections, no surprise,
> so I don't know of a way to get the Java code where the threads are
> blocking.
>
> More than one fd can be open on a given file, and many of open fd's are
> on files that have been deleted.  The stale fd's are all on Data.db files in
> the
> data directory, which I have separate from the commit log directory.
>
> I haven't had a chance to look at the code handling files, and I am not any
> sort of Java expert, but could this be due to Java's lazy resource clean up?
> I wonder if when considering writing your own file handling classes for
> O_DIRECT or posix_fadvise or whatever, an explicit close(2) might help.
>
> A restart of the client causes immediate SocketExceptions at the server and
> timeouts at the client.
>
> I noted on the restart that the open fd's jumped by 32, despite only making
> 4 connections.  At this point, there were 2028 open files - more than there
> where when the exceptions began at 2002 open files.  So it seems like the
> exception is not caused by the OS returning EMFILE - unless it was returning
> EMFILE for some strange reason, and the bump in open files is due to an
> increase in duplicate open files.  (BTW, it's not ENFILE!).
>
> I also noted that although the TimeoutExceptions did not  occur immediately
> on the client, the SocketExceptions began immediately on the server.  This
> does not seem to match up.  I am using the org.apache.cassandra.thrift API
> directly, not any higher level wrapper.
>
> Finally, this jump to 2028 on the restart caused a new symptom.  I only had
> the client running a few seconds, but after 15 minutes, the server is still
> throwing
> exceptions, even though the open file handles immediately dropped from
> 2028 down to 1967.
>
> Thanks for your attention, and all your work,
>
> Thomas Downing
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: Too many open files [was Re: Minimizing the impact of compaction on latency and throughput]

Posted by Peter Schuller <pe...@infidyne.com>.

> As a Cassandra newbie, I'm not sure how to tell, but they are all
> to *.Data.db files, and all under the DataFileDirectory (as spec'ed
> in storage-conf.xml), which is a separate directory than the
> CommitLogDirectory.  I did not see any *Index.db or *Filter.db
> files, but I may have missed them.

The *.Data.db files are indeed sstables.

-- 
/ Peter Schuller

Re: Too many open files [was Re: Minimizing the impact of compaction on latency and throughput]

Posted by Thomas Downing <td...@proteus-technologies.com>.

On 7/14/2010 7:16 AM, Peter Schuller wrote:
>> More than one fd can be open on a given file, and many of open fd's are
>> on files that have been deleted.  The stale fd's are all on Data.db files in
>> the
>> data directory, which I have separate from the commit log directory.
>>
>> I haven't had a chance to look at the code handling files, and I am not any
>> sort of Java expert, but could this be due to Java's lazy resource clean up?
>> I wonder if when considering writing your own file handling classes for
>> O_DIRECT or posix_fadvise or whatever, an explicit close(2) might help.
>>      
> The fact that there are open fds to deleted files is interesting... I
> wonder if people have reported weird disk space usage in the past
> (since such deleted files would not show up with 'du -sh' but eat
> space on the device until closed).
>
> My general understanding is that Cassandra does specifically rely on
> the GC to know when unused sstables can be removed. However the fact
> that the files are deleted I think means that this is not the problem,
> and the question is rather why open file descriptors/streams are
> leaking to these deleted sstables. But I'm speaking now without
> knowing when/where streams are closed.
>
> Are the deleted files indeed sstable, or was that a bad assumption on my part?
>
>    
As a Cassandra newbie, I'm not sure how to tell, but they are all
to *.Data.db files, and all under the DataFileDirectory (as spec'ed
in storage-conf.xml), which is a separate directory than the
CommitLogDirectory.  I did not see any *Index.db or *Filter.db
files, but I may have missed them.

Re: Too many open files [was Re: Minimizing the impact of compaction on latency and throughput]

Posted by Peter Schuller <pe...@infidyne.com>.

> More than one fd can be open on a given file, and many of open fd's are
> on files that have been deleted.  The stale fd's are all on Data.db files in
> the
> data directory, which I have separate from the commit log directory.
>
> I haven't had a chance to look at the code handling files, and I am not any
> sort of Java expert, but could this be due to Java's lazy resource clean up?
> I wonder if when considering writing your own file handling classes for
> O_DIRECT or posix_fadvise or whatever, an explicit close(2) might help.

The fact that there are open fds to deleted files is interesting... I
wonder if people have reported weird disk space usage in the past
(since such deleted files would not show up with 'du -sh' but eat
space on the device until closed).

My general understanding is that Cassandra does specifically rely on
the GC to know when unused sstables can be removed. However the fact
that the files are deleted I think means that this is not the problem,
and the question is rather why open file descriptors/streams are
leaking to these deleted sstables. But I'm speaking now without
knowing when/where streams are closed.

Are the deleted files indeed sstable, or was that a bad assumption on my part?

-- 
/ Peter Schuller

Re: Too many open files [was Re: Minimizing the impact of compaction on latency and throughput]

Posted by Jonathan Ellis <jb...@gmail.com>.

socketexception means this is coming from the network, not the sstables

knowing the full error message would be nice, but just about any
problem on that end should be fixed by adding connection pooling to
your client.

(moving to user@)

On Wed, Jul 14, 2010 at 5:09 AM, Thomas Downing
<td...@proteus-technologies.com> wrote:
> On 7/13/2010 9:20 AM, Jonathan Ellis wrote:
>>
>> On Tue, Jul 13, 2010 at 4:19 AM, Thomas Downing
>> <td...@proteus-technologies.com>  wrote:
>>
>>>
>>> On a related note:  I am running some feasibility tests looking for
>>> high ingest rate capabilities.  While testing Cassandra the problem
>>> I've encountered is that it runs out of file handles during compaction.
>>>
>>
>> This usually just means "increase the allowed fh via ulimit/etc."
>>
>> Increasing the memtable thresholds so that you create less sstables,
>> but larger ones, is also a good idea.  The defaults are small so
>> Cassandra can work on a 1GB heap which is much smaller than most
>> production ones.  Reasonable rule of thumb: if you have a heap of N
>> GB, increase both the throughput and count thresholds by N times.
>>
>>
>
> Thanks for the suggestion.  I gave it a whirl, but no go.  The file handles
> in
> in use stayed at around 500 for the first 30M or so mutates, then within
> 4 seconds they jumped to about 800, stayed there for about 30 seconds,
> then within 5 seconds went over 2022, at which point the server entered
> the cycle of "SocketException: Too many open files.  Interesting thing is
> that the file limit for this process is 32768.  Note the numbers below as
> well.
>
> If there is anything specific you would like me to try, let me know.
>
> Seems like there's some sort of non-linear behavior here.  This behavior is
> the same as before I multiplied the Cassandra params by 4 (number of G);
> which leads me to think that increasing limits, whether files or Cassandra
> parameters is likely to be a tail-chasing excercise.
>
> This causes time-out exceptions at the client.  On this exception, my client
> closes the connection, waits a bit, then retries.  After a few hours of this
> the server still had not recovered.
>
> I killed the clients, and watched the server after that.  The file handles
> open
> dropped by 8, and have stayed there.  The server is, of course, not throwing
> SocketException any more.  On the other hand, the server is not doing any
> thing at all.
>
> When there is no client activity, and the server is idle, there are 155
> threads
> running in the JVM.  The all are in one of three states, almost all blocked
> at
> futex( ),  a few blocked at accept( ) , a few cycling over timeout on
> futex(),
> gettimeofday(), futex() ... None are blocked at IO.  I can't attach a
> debugger,
> I get IO exceptions trying either socket or local connections, no surprise,
> so I don't know of a way to get the Java code where the threads are
> blocking.
>
> More than one fd can be open on a given file, and many of open fd's are
> on files that have been deleted.  The stale fd's are all on Data.db files in
> the
> data directory, which I have separate from the commit log directory.
>
> I haven't had a chance to look at the code handling files, and I am not any
> sort of Java expert, but could this be due to Java's lazy resource clean up?
> I wonder if when considering writing your own file handling classes for
> O_DIRECT or posix_fadvise or whatever, an explicit close(2) might help.
>
> A restart of the client causes immediate SocketExceptions at the server and
> timeouts at the client.
>
> I noted on the restart that the open fd's jumped by 32, despite only making
> 4 connections.  At this point, there were 2028 open files - more than there
> where when the exceptions began at 2002 open files.  So it seems like the
> exception is not caused by the OS returning EMFILE - unless it was returning
> EMFILE for some strange reason, and the bump in open files is due to an
> increase in duplicate open files.  (BTW, it's not ENFILE!).
>
> I also noted that although the TimeoutExceptions did not  occur immediately
> on the client, the SocketExceptions began immediately on the server.  This
> does not seem to match up.  I am using the org.apache.cassandra.thrift API
> directly, not any higher level wrapper.
>
> Finally, this jump to 2028 on the restart caused a new symptom.  I only had
> the client running a few seconds, but after 15 minutes, the server is still
> throwing
> exceptions, even though the open file handles immediately dropped from
> 2028 down to 1967.
>
> Thanks for your attention, and all your work,
>
> Thomas Downing
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com