You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by "Desimpel, Ignace" <Ig...@nuance.com> on 2013/06/27 15:24:11 UTC
Too many open files and stopped compaction with many pending
compaction tasks
On a test with 3 cassandra servers version 1.2.5 with replication factor 1 and leveled compaction, I did a store last night and I did not see any problem with Cassandra. On all 3 machine the compaction is stopped already several hours. However , one machine reports 650 pending compaction tasks (via jmx).
compaction_throughput_mb_per_sec is 0.
Concurrent_compactors is 3.
multithreaded_compaction = false.
No other load on these machines.
And when I start querying (using thrift), I get a 'too many open files' error on the machine with pending compaction tasks.
Limits.conf setting for nofile is 65536
Using 'lsof' and 'wc -l' I get a count of 59577 files for Cassandra.
Total count of keyspace files on disk : 20464.
The 3 machines have an equal (+/-) data load of about 60 GB. I see that 2 machines have no unleveled or just 1 sstables on any keyspace, but on the machine with troubles there is one keyspace having 670 unleveled sstables. Level sstable histo [670,28,106,14] thus 818 sstables. An 'ls' on that directory counts for 5729 files, which corresponds to the 818 sstable (7 files per sstables).
After restart of that machine I get 4037 open files for Cassandra. And also compaction has restarted. Once finisched I get SSTableCountPerLEvel = [0,10, 109, 644].
Also, compaction reports speeds of 2.5 MB per sec. Seems slow too me. CPU less than 10%, Disk 15% with peeks to 45% (15000 rpm scsi). 14 GB free memory.
So I am puzzled about the number of open files and number of unleveled sstables, and a not so fast compaction.
Anything than can be done? Or to be done so that the next time I can get more useful information?
Regards,
Ignace
Example output of lsof is :
java 10968 root 483r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db
java 10968 root 484u REG 8,1 33554432 29229231 /home/cassandra/deployed/data/cdi.cassandra.cdi/dbcommitlog/CommitLog-2-1372260568123.log
java 10968 root 485r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db
java 10968 root 486r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db
java 10968 root 487r REG 8,17 39967253 14158943 /media/datadrive1/dbdatafile/Ks100K/ReverseStringFunction/Ks100K-ReverseStringFunction-ic-481-Data.db
java 10968 root 488r REG 8,17 58641524 14158942 /media/datadrive1/dbdatafile/Ks100K/ReverseStringFunction/Ks100K-ReverseStringFunction-ic-481-Index.db
java 10968 root 489r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db
java 10968 root 490r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db
java 10968 root 491r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db
java 10968 root 492u REG 8,1 33554432 29230501 /home/cassandra/deployed/data/cdi.cassandra.cdi/dbcommitlog/CommitLog-2-1372260568134.log
java 10968 root 493r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db
java 10968 root 494r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db
java 10968 root 495r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db
java 10968 root 497u REG 8,1 33554432 29242455 /home/cassandra/deployed/data/cdi.cassandra.cdi/dbcommitlog/CommitLog-2-1372260568126.log
java 10968 root 498r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db
java 10968 root 499r REG 8,17 39725539 14160146 /media/datadrive1/dbdatafile/Ks100K/ReverseStringFunction/Ks100K-ReverseStringFunction-ic-1019-Data.db
java 10968 root 500r REG 8,17 56369841 14160005 /media/datadrive1/dbdatafile/Ks100K/ReverseStringFunction/Ks100K-ReverseStringFunction-ic-1019-Index.db
java 10968 root 502r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db
java 10968 root 504r REG 8,17 1989198 14163384 /media/datadrive1/dbdatafile/Ks100K/ReverseStringFunction/Ks100K-ReverseStringFunction-ic-922-Data.db
java 10968 root 505r REG 8,17 40679209 14161763 /media/datadrive1/dbdatafile/Ks100K/ReverseStringFunction/Ks100K-ReverseStringFunction-ic-543-Data.db
java 10968 root 506u REG 8,1 33554432 29250917 /home/cassandra/deployed/data/cdi.cassandra.cdi/dbcommitlog/CommitLog-2-1372260568106.log
java 10968 root 507r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db
java 10968 root 508r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db
java 10968 root 509r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db
java 10968 root 510r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db
java 10968 root 511u REG 8,1 33554432 29229238 /home/cassandra/deployed/data/cdi.cassandra.cdi/dbcommitlog/CommitLog-2-1372260568108.log
RE: Too many open files and stopped compaction with many pending
compaction tasks
Posted by "Desimpel, Ignace" <Ig...@nuance.com>.
No : just two 15000 rpm scsi disks per machine. Each disk can handle more than 100MB/sec streaming data (tested). Iostat reports service times of 2 or 3 milli sec.
Ubuntu 12.04 LTS 48 GB memory, 24 CPU Xeon X 5670
Cassandra is started with 8GB.
-----Original Message-----
From: Jeremy Hanna [mailto:jeremy.hanna1234@gmail.com]
Sent: donderdag 27 juni 2013 15:36
To: user@cassandra.apache.org
Subject: Re: Too many open files and stopped compaction with many pending compaction tasks
Are you on SSDs?
On 27 Jun 2013, at 14:24, "Desimpel, Ignace" <Ig...@nuance.com> wrote:
> On a test with 3 cassandra servers version 1.2.5 with replication factor 1 and leveled compaction, I did a store last night and I did not see any problem with Cassandra. On all 3 machine the compaction is stopped already several hours. However , one machine reports 650 pending compaction tasks (via jmx).
> compaction_throughput_mb_per_sec is 0.
> Concurrent_compactors is 3.
> multithreaded_compaction = false.
> No other load on these machines.
>
> And when I start querying (using thrift), I get a 'too many open files' error on the machine with pending compaction tasks.
>
> Limits.conf setting for nofile is 65536 Using 'lsof' and 'wc -l' I
> get a count of 59577 files for Cassandra.
> Total count of keyspace files on disk : 20464.
>
> The 3 machines have an equal (+/-) data load of about 60 GB. I see that 2 machines have no unleveled or just 1 sstables on any keyspace, but on the machine with troubles there is one keyspace having 670 unleveled sstables. Level sstable histo [670,28,106,14] thus 818 sstables. An 'ls' on that directory counts for 5729 files, which corresponds to the 818 sstable (7 files per sstables).
>
> After restart of that machine I get 4037 open files for Cassandra. And also compaction has restarted. Once finisched I get SSTableCountPerLEvel = [0,10, 109, 644].
> Also, compaction reports speeds of 2.5 MB per sec. Seems slow too me. CPU less than 10%, Disk 15% with peeks to 45% (15000 rpm scsi). 14 GB free memory.
>
> So I am puzzled about the number of open files and number of unleveled sstables, and a not so fast compaction.
>
> Anything than can be done? Or to be done so that the next time I can get more useful information?
>
> Regards,
> Ignace
>
> Example output of lsof is :
> java 10968 root 483r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db
> java 10968 root 484u REG 8,1 33554432 29229231 /home/cassandra/deployed/data/cdi.cassandra.cdi/dbcommitlog/CommitLog-2-1372260568123.log
> java 10968 root 485r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db
> java 10968 root 486r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db
> java 10968 root 487r REG 8,17 39967253 14158943 /media/datadrive1/dbdatafile/Ks100K/ReverseStringFunction/Ks100K-ReverseStringFunction-ic-481-Data.db
> java 10968 root 488r REG 8,17 58641524 14158942 /media/datadrive1/dbdatafile/Ks100K/ReverseStringFunction/Ks100K-ReverseStringFunction-ic-481-Index.db
> java 10968 root 489r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db
> java 10968 root 490r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db
> java 10968 root 491r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db
> java 10968 root 492u REG 8,1 33554432 29230501 /home/cassandra/deployed/data/cdi.cassandra.cdi/dbcommitlog/CommitLog-2-1372260568134.log
> java 10968 root 493r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db
> java 10968 root 494r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db
> java 10968 root 495r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db
> java 10968 root 497u REG 8,1 33554432 29242455 /home/cassandra/deployed/data/cdi.cassandra.cdi/dbcommitlog/CommitLog-2-1372260568126.log
> java 10968 root 498r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db
> java 10968 root 499r REG 8,17 39725539 14160146 /media/datadrive1/dbdatafile/Ks100K/ReverseStringFunction/Ks100K-ReverseStringFunction-ic-1019-Data.db
> java 10968 root 500r REG 8,17 56369841 14160005 /media/datadrive1/dbdatafile/Ks100K/ReverseStringFunction/Ks100K-ReverseStringFunction-ic-1019-Index.db
> java 10968 root 502r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db
> java 10968 root 504r REG 8,17 1989198 14163384 /media/datadrive1/dbdatafile/Ks100K/ReverseStringFunction/Ks100K-ReverseStringFunction-ic-922-Data.db
> java 10968 root 505r REG 8,17 40679209 14161763 /media/datadrive1/dbdatafile/Ks100K/ReverseStringFunction/Ks100K-ReverseStringFunction-ic-543-Data.db
> java 10968 root 506u REG 8,1 33554432 29250917 /home/cassandra/deployed/data/cdi.cassandra.cdi/dbcommitlog/CommitLog-2-1372260568106.log
> java 10968 root 507r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db
> java 10968 root 508r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db
> java 10968 root 509r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db
> java 10968 root 510r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db
> java 10968 root 511u REG 8,1 33554432 29229238 /home/cassandra/deployed/data/cdi.cassandra.cdi/dbcommitlog/CommitLog-2-1372260568108.log
Re: Too many open files and stopped compaction with many pending compaction tasks
Posted by Jeremy Hanna <je...@gmail.com>.
Are you on SSDs?
On 27 Jun 2013, at 14:24, "Desimpel, Ignace" <Ig...@nuance.com> wrote:
> On a test with 3 cassandra servers version 1.2.5 with replication factor 1 and leveled compaction, I did a store last night and I did not see any problem with Cassandra. On all 3 machine the compaction is stopped already several hours. However , one machine reports 650 pending compaction tasks (via jmx).
> compaction_throughput_mb_per_sec is 0.
> Concurrent_compactors is 3.
> multithreaded_compaction = false.
> No other load on these machines.
>
> And when I start querying (using thrift), I get a ’too many open files’ error on the machine with pending compaction tasks.
>
> Limits.conf setting for nofile is 65536
> Using ‘lsof’ and ‘wc –l’ I get a count of 59577 files for Cassandra.
> Total count of keyspace files on disk : 20464.
>
> The 3 machines have an equal (+/-) data load of about 60 GB. I see that 2 machines have no unleveled or just 1 sstables on any keyspace, but on the machine with troubles there is one keyspace having 670 unleveled sstables. Level sstable histo [670,28,106,14] thus 818 sstables. An ‘ls’ on that directory counts for 5729 files, which corresponds to the 818 sstable (7 files per sstables).
>
> After restart of that machine I get 4037 open files for Cassandra. And also compaction has restarted. Once finisched I get SSTableCountPerLEvel = [0,10, 109, 644].
> Also, compaction reports speeds of 2.5 MB per sec. Seems slow too me. CPU less than 10%, Disk 15% with peeks to 45% (15000 rpm scsi). 14 GB free memory.
>
> So I am puzzled about the number of open files and number of unleveled sstables, and a not so fast compaction.
>
> Anything than can be done? Or to be done so that the next time I can get more useful information?
>
> Regards,
> Ignace
>
> Example output of lsof is :
> java 10968 root 483r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db
> java 10968 root 484u REG 8,1 33554432 29229231 /home/cassandra/deployed/data/cdi.cassandra.cdi/dbcommitlog/CommitLog-2-1372260568123.log
> java 10968 root 485r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db
> java 10968 root 486r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db
> java 10968 root 487r REG 8,17 39967253 14158943 /media/datadrive1/dbdatafile/Ks100K/ReverseStringFunction/Ks100K-ReverseStringFunction-ic-481-Data.db
> java 10968 root 488r REG 8,17 58641524 14158942 /media/datadrive1/dbdatafile/Ks100K/ReverseStringFunction/Ks100K-ReverseStringFunction-ic-481-Index.db
> java 10968 root 489r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db
> java 10968 root 490r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db
> java 10968 root 491r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db
> java 10968 root 492u REG 8,1 33554432 29230501 /home/cassandra/deployed/data/cdi.cassandra.cdi/dbcommitlog/CommitLog-2-1372260568134.log
> java 10968 root 493r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db
> java 10968 root 494r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db
> java 10968 root 495r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db
> java 10968 root 497u REG 8,1 33554432 29242455 /home/cassandra/deployed/data/cdi.cassandra.cdi/dbcommitlog/CommitLog-2-1372260568126.log
> java 10968 root 498r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db
> java 10968 root 499r REG 8,17 39725539 14160146 /media/datadrive1/dbdatafile/Ks100K/ReverseStringFunction/Ks100K-ReverseStringFunction-ic-1019-Data.db
> java 10968 root 500r REG 8,17 56369841 14160005 /media/datadrive1/dbdatafile/Ks100K/ReverseStringFunction/Ks100K-ReverseStringFunction-ic-1019-Index.db
> java 10968 root 502r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db
> java 10968 root 504r REG 8,17 1989198 14163384 /media/datadrive1/dbdatafile/Ks100K/ReverseStringFunction/Ks100K-ReverseStringFunction-ic-922-Data.db
> java 10968 root 505r REG 8,17 40679209 14161763 /media/datadrive1/dbdatafile/Ks100K/ReverseStringFunction/Ks100K-ReverseStringFunction-ic-543-Data.db
> java 10968 root 506u REG 8,1 33554432 29250917 /home/cassandra/deployed/data/cdi.cassandra.cdi/dbcommitlog/CommitLog-2-1372260568106.log
> java 10968 root 507r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db
> java 10968 root 508r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db
> java 10968 root 509r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db
> java 10968 root 510r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db
> java 10968 root 511u REG 8,1 33554432 29229238 /home/cassandra/deployed/data/cdi.cassandra.cdi/dbcommitlog/CommitLog-2-1372260568108.log