You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Thorsten von Eicken <tv...@rightscale.com> on 2012/01/13 20:01:48 UTC

cassandra hit a wall: Too many open files (98567!)

I'm running a single node cassandra 1.0.6 server which hit a wall yesterday:

ERROR [CompactionExecutor:2918] 2012-01-12 20:37:06,327
AbstractCassandraDaemon.java (line 133) Fatal exception in thread
Thread[CompactionExecutor:2918,1,main] java.io.IOError:
java.io.FileNotFoundException:
/mnt/ebs/data/rslog_production/req_word_idx-hc-453661-Data.db (Too many
open files in system)

After that it stopped working and just say there with this error
(undestandable). I did an lsof and saw that it had 98567 open files,
yikes! An ls in the data directory shows 234011 files. After restarting
it spent about 5 hours compacting, then quieted down. About 173k files
left in the data directory. I'm using leveldb (with compression). I
looked into the json of the two large CFs and gen 0 is empty, most
sstables are gen 3 & 4. I have a total of about 150GB of data
(compressed). Almost all the SStables are around 3MB in size. Aren't
they supposed to get 10x bigger at higher gen's?

This situation can't be healthy, can it? Suggestions?

Re: cassandra hit a wall: Too many open files (98567!)

Posted by Thorsten von Eicken <tv...@rightscale.com>.
Ah, that explains part of the problem indeed. The whole situation still
doesn't make a lot of sense to me, unless the answer is that the default
sstable size with level compaction is just no good for large datasets. I
restarted cassandra a few hours ago and it had to open about 32k files
at start-up. Took about 15 minutes. That just can't be good...

I also noticed that when using compression the sstable size specified is
uncompressed, so the actual files tend to be smaller. I now upped the
sstable size to 100MB, which should result in about 40MB files in my
case. Is there a way I can "compact" some of the existing sstables that
are small? For example, I have a level-4 sstable that is 56KB in size
and many more that are rather small. Does nodetool compact do anything
with level compaction?

On 1/18/2012 2:39 AM, Janne Jalkanen wrote:
>
> 1.0.6 has a file leak problem, fixed in 1.0.7. Perhaps this is the reason?
>
> https://issues.apache.org/jira/browse/CASSANDRA-3616
>
> /Janne
>
> On Jan 18, 2012, at 03:52 , dir dir wrote:
>
>> Very Interesting.... Why you open so many file? Actually what kind of
>> system that is built by you until open so many files? would you tell us?
>> Thanks...
>>
>>
>> On Sat, Jan 14, 2012 at 2:01 AM, Thorsten von Eicken
>> <tve@rightscale.com <ma...@rightscale.com>> wrote:
>>
>>     I'm running a single node cassandra 1.0.6 server which hit a wall
>>     yesterday:
>>
>>     ERROR [CompactionExecutor:2918] 2012-01-12 20
>>     <tel:2012-01-12%2020>:37:06,327
>>     AbstractCassandraDaemon.java (line 133) Fatal exception in thread
>>     Thread[CompactionExecutor:2918,1,main] java.io.IOError:
>>     java.io.FileNotFoundException:
>>     /mnt/ebs/data/rslog_production/req_word_idx-hc-453661-Data.db
>>     (Too many
>>     open files in system)
>>
>>     After that it stopped working and just say there with this error
>>     (undestandable). I did an lsof and saw that it had 98567 open files,
>>     yikes! An ls in the data directory shows 234011 files. After
>>     restarting
>>     it spent about 5 hours compacting, then quieted down. About 173k
>>     files
>>     left in the data directory. I'm using leveldb (with compression). I
>>     looked into the json of the two large CFs and gen 0 is empty, most
>>     sstables are gen 3 & 4. I have a total of about 150GB of data
>>     (compressed). Almost all the SStables are around 3MB in size. Aren't
>>     they supposed to get 10x bigger at higher gen's?
>>
>>     This situation can't be healthy, can it? Suggestions?
>>
>>
>

Re: cassandra hit a wall: Too many open files (98567!)

Posted by Janne Jalkanen <Ja...@ecyrd.com>.
1.0.6 has a file leak problem, fixed in 1.0.7. Perhaps this is the reason?

https://issues.apache.org/jira/browse/CASSANDRA-3616

/Janne

On Jan 18, 2012, at 03:52 , dir dir wrote:

> Very Interesting.... Why you open so many file? Actually what kind of
> system that is built by you until open so many files? would you tell us?
> Thanks...
> 
> 
> On Sat, Jan 14, 2012 at 2:01 AM, Thorsten von Eicken <tv...@rightscale.com> wrote:
> I'm running a single node cassandra 1.0.6 server which hit a wall yesterday:
> 
> ERROR [CompactionExecutor:2918] 2012-01-12 20:37:06,327
> AbstractCassandraDaemon.java (line 133) Fatal exception in thread
> Thread[CompactionExecutor:2918,1,main] java.io.IOError:
> java.io.FileNotFoundException:
> /mnt/ebs/data/rslog_production/req_word_idx-hc-453661-Data.db (Too many
> open files in system)
> 
> After that it stopped working and just say there with this error
> (undestandable). I did an lsof and saw that it had 98567 open files,
> yikes! An ls in the data directory shows 234011 files. After restarting
> it spent about 5 hours compacting, then quieted down. About 173k files
> left in the data directory. I'm using leveldb (with compression). I
> looked into the json of the two large CFs and gen 0 is empty, most
> sstables are gen 3 & 4. I have a total of about 150GB of data
> (compressed). Almost all the SStables are around 3MB in size. Aren't
> they supposed to get 10x bigger at higher gen's?
> 
> This situation can't be healthy, can it? Suggestions?
> 


Re: cassandra hit a wall: Too many open files (98567!)

Posted by dir dir <si...@gmail.com>.
Very Interesting.... Why you open so many file? Actually what kind of
system that is built by you until open so many files? would you tell us?
Thanks...


On Sat, Jan 14, 2012 at 2:01 AM, Thorsten von Eicken <tv...@rightscale.com>wrote:

> I'm running a single node cassandra 1.0.6 server which hit a wall
> yesterday:
>
> ERROR [CompactionExecutor:2918] 2012-01-12 20:37:06,327
> AbstractCassandraDaemon.java (line 133) Fatal exception in thread
> Thread[CompactionExecutor:2918,1,main] java.io.IOError:
> java.io.FileNotFoundException:
> /mnt/ebs/data/rslog_production/req_word_idx-hc-453661-Data.db (Too many
> open files in system)
>
> After that it stopped working and just say there with this error
> (undestandable). I did an lsof and saw that it had 98567 open files,
> yikes! An ls in the data directory shows 234011 files. After restarting
> it spent about 5 hours compacting, then quieted down. About 173k files
> left in the data directory. I'm using leveldb (with compression). I
> looked into the json of the two large CFs and gen 0 is empty, most
> sstables are gen 3 & 4. I have a total of about 150GB of data
> (compressed). Almost all the SStables are around 3MB in size. Aren't
> they supposed to get 10x bigger at higher gen's?
>
> This situation can't be healthy, can it? Suggestions?
>

Re: cassandra hit a wall: Too many open files (98567!)

Posted by Sylvain Lebresne <sy...@datastax.com>.
On Fri, Jan 13, 2012 at 8:01 PM, Thorsten von Eicken <tv...@rightscale.com> wrote:
> I'm running a single node cassandra 1.0.6 server which hit a wall yesterday:
>
> ERROR [CompactionExecutor:2918] 2012-01-12 20:37:06,327
> AbstractCassandraDaemon.java (line 133) Fatal exception in thread
> Thread[CompactionExecutor:2918,1,main] java.io.IOError:
> java.io.FileNotFoundException:
> /mnt/ebs/data/rslog_production/req_word_idx-hc-453661-Data.db (Too many
> open files in system)
>
> After that it stopped working and just say there with this error
> (undestandable). I did an lsof and saw that it had 98567 open files,
> yikes! An ls in the data directory shows 234011 files. After restarting
> it spent about 5 hours compacting, then quieted down. About 173k files
> left in the data directory. I'm using leveldb (with compression). I
> looked into the json of the two large CFs and gen 0 is empty, most
> sstables are gen 3 & 4. I have a total of about 150GB of data
> (compressed). Almost all the SStables are around 3MB in size. Aren't
> they supposed to get 10x bigger at higher gen's?

No, with leveled compaction, the (max) size of sstables is fixed
whatever the generation is (the default is 5MB, but it's 5MB of
uncompressed data (we may change that though) so 3MB sound about
right).
What changes between generations is the number of sstables it can
contain. Gen 1 can have 10 sstables (it can have more but only
temporarily), Gen 2 can have 100, Gen 3 can have 1000 etc.. So again,
that most sstables are in gen 3 and 4 is expected too.

> This situation can't be healthy, can it? Suggestions?

Leveled compaction uses lots of files (the number is proportional to
the amount of data). It is not necessarily a big problem as modern OS
deal wit big amount of open files fairly well (as far as I know at
least). I would just up the file descriptor ulimit and not worry too
much about it, unless you have reasons to believe that it's an actual
descriptor leak (but given the number of files you have, the number of
open ones doesn't seem off so I don't think there is one here) or that
this has performance impacts.

--
Sylvain

Re: cassandra hit a wall: Too many open files (98567!)

Posted by aaron morton <aa...@thelastpickle.com>.
That sounds like to many sstables. 

Out of interest were you using multi threaded compaction ? Just wondering about this 
https://issues.apache.org/jira/browse/CASSANDRA-3711

Can you set the file handles to unlimited ? 

Can you provide some more info what your see in the data dir incase it is a bug in leveled compaction. 

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 14/01/2012, at 8:01 AM, Thorsten von Eicken wrote:

> I'm running a single node cassandra 1.0.6 server which hit a wall yesterday:
> 
> ERROR [CompactionExecutor:2918] 2012-01-12 20:37:06,327
> AbstractCassandraDaemon.java (line 133) Fatal exception in thread
> Thread[CompactionExecutor:2918,1,main] java.io.IOError:
> java.io.FileNotFoundException:
> /mnt/ebs/data/rslog_production/req_word_idx-hc-453661-Data.db (Too many
> open files in system)
> 
> After that it stopped working and just say there with this error
> (undestandable). I did an lsof and saw that it had 98567 open files,
> yikes! An ls in the data directory shows 234011 files. After restarting
> it spent about 5 hours compacting, then quieted down. About 173k files
> left in the data directory. I'm using leveldb (with compression). I
> looked into the json of the two large CFs and gen 0 is empty, most
> sstables are gen 3 & 4. I have a total of about 150GB of data
> (compressed). Almost all the SStables are around 3MB in size. Aren't
> they supposed to get 10x bigger at higher gen's?
> 
> This situation can't be healthy, can it? Suggestions?