You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Philippe <wa...@gmail.com> on 2011/08/14 19:32:30 UTC

Scalability question

Hi,

As on-disk SSTables become bigger and bigger because more data is added in
the ring, compactions take longer and longer because each file is becoming
bigger.
Isn't there a time where compacting will take so long that compaction just
can't keep up with the amount of data ? It looks to me like that's
independent from the write throughput, just a question of how long it takes.

What am I missing ?

Thanks
Philippe

Re: Scalability question

Posted by Jonathan Ellis <jb...@gmail.com>.

This is more an artifact of repair's problems than compaction per se.
We're addressing these in
https://issues.apache.org/jira/browse/CASSANDRA-2816 and
https://issues.apache.org/jira/browse/CASSANDRA-2280.

On Mon, Aug 15, 2011 at 3:06 PM, Philippe <wa...@gmail.com> wrote:
>> It's another reason to avoid major / manual compactions which create a
>> single big SSTable. Minor compactions keep things in buckets   which means
>> newer SSTable can be compacted needing to read the bigger older tables.
>
> I've never run a major/manual compaction on this ring.
> In my case running repair on a "big" keyspace results in SSTables piling up.
> My problematic node just filled up 483GB (yes, GB) of SSTTables. Here are
> the biggest
> ls -laSrh
> (...)
>
> -rw-r--r-- 1 cassandra cassandra  2.7G 2011-08-15 14:13
> PUBLIC_MONTHLY_20-g-4581-Data.db
>
> -rw-r--r-- 1 cassandra cassandra  2.7G 2011-08-15 14:52
> PUBLIC_MONTHLY_20-g-4641-Data.db
>
> -rw-r--r-- 1 cassandra cassandra  2.8G 2011-08-15 14:39
> PUBLIC_MONTHLY_20-tmp-g-4878-Data.db
>
> -rw-r--r-- 1 cassandra cassandra  2.9G 2011-08-15 15:00
> PUBLIC_MONTHLY_20-g-4656-Data.db
>
> -rw-r--r-- 1 cassandra cassandra  3.0G 2011-08-15 14:17
> PUBLIC_MONTHLY_20-g-4599-Data.db
>
> -rw-r--r-- 1 cassandra cassandra  3.0G 2011-08-15 15:11
> PUBLIC_MONTHLY_20-g-4675-Data.db
>
> -rw-r--r-- 3 cassandra cassandra  3.1G 2011-08-13 10:34
> PUBLIC_MONTHLY_18-g-3861-Data.db
>
> -rw-r--r-- 1 cassandra cassandra  3.2G 2011-08-15 14:41
> PUBLIC_MONTHLY_20-tmp-g-4884-Data.db
>
> -rw-r--r-- 1 cassandra cassandra  3.6G 2011-08-15 14:44
> PUBLIC_MONTHLY_20-tmp-g-4894-Data.db
>
> -rw-r--r-- 1 cassandra cassandra  3.8G 2011-08-15 14:56
> PUBLIC_MONTHLY_20-tmp-g-4934-Data.db
>
> -rw-r--r-- 1 cassandra cassandra  3.8G 2011-08-15 14:46
> PUBLIC_MONTHLY_20-tmp-g-4905-Data.db
>
> -rw-r--r-- 1 cassandra cassandra  4.0G 2011-08-15 14:57
> PUBLIC_MONTHLY_20-tmp-g-4935-Data.db
>
> -rw-r--r-- 3 cassandra cassandra  5.9G 2011-08-13 12:53
> PUBLIC_MONTHLY_19-g-4219-Data.db
>
> -rw-r--r-- 3 cassandra cassandra  6.0G 2011-08-13 13:57
> PUBLIC_MONTHLY_20-g-4538-Data.db
>
> -rw-r--r-- 3 cassandra cassandra   12G 2011-08-13 09:27
> PUBLIC_MONTHLY_20-g-4501-Data.db
>
> On the other nodes the same directory is around 69GB. Why are there so fewer
> large files there and so many big ones on the repairing node ?
>  -rw-r--r-- 1 cassandra cassandra 434M 2011-08-15 16:02
> PUBLIC_MONTHLY_17-g-3525-Data.db
> -rw-r--r-- 1 cassandra cassandra 456M 2011-08-15 15:50
> PUBLIC_MONTHLY_19-g-4253-Data.db
> -rw-r--r-- 1 cassandra cassandra 485M 2011-08-15 14:30
> PUBLIC_MONTHLY_20-g-5280-Data.db
> -rw-r--r-- 1 cassandra cassandra 572M 2011-08-15 15:15
> PUBLIC_MONTHLY_18-g-3774-Data.db
> -rw-r--r-- 2 cassandra cassandra 664M 2011-08-09 15:39
> PUBLIC_MONTHLY_20-g-4893-Index.db
> -rw-r--r-- 2 cassandra cassandra 811M 2011-08-11 21:27
> PUBLIC_MONTHLY_16-g-2597-Data.db
> -rw-r--r-- 2 cassandra cassandra 915M 2011-08-13 04:00
> PUBLIC_MONTHLY_18-g-3695-Data.db
> -rw-r--r-- 1 cassandra cassandra 925M 2011-08-15 03:39
> PUBLIC_MONTHLY_17-g-3454-Data.db
> -rw-r--r-- 1 cassandra cassandra 1.3G 2011-08-15 13:46
> PUBLIC_MONTHLY_19-g-4199-Data.db
> -rw-r--r-- 2 cassandra cassandra 1.5G 2011-08-10 15:37
> PUBLIC_MONTHLY_17-g-3218-Data.db
> -rw-r--r-- 1 cassandra cassandra 1.9G 2011-08-15 14:35
> PUBLIC_MONTHLY_20-g-5281-Data.db
> -rw-r--r-- 2 cassandra cassandra 2.1G 2011-08-10 16:33
> PUBLIC_MONTHLY_19-g-3946-Data.db
> -rw-r--r-- 2 cassandra cassandra 3.1G 2011-08-10 22:23
> PUBLIC_MONTHLY_18-g-3509-Data.db
> -rw-r--r-- 2 cassandra cassandra 4.0G 2011-08-10 18:18
> PUBLIC_MONTHLY_20-g-5024-Data.db
> -rw------- 2 cassandra cassandra 5.1G 2011-08-09 15:23
> PUBLIC_MONTHLY_19-g-3847-Data.db
> -rw-r--r-- 2 cassandra cassandra 9.6G 2011-08-09 15:39
> PUBLIC_MONTHLY_20-g-4893-Data.db
> This whole compaction thing is getting me worried : how are sites in
> production dealing with SSTables becoming larger and larger and thus taking
> longer and longer to compact ? Adding nodes every couple of weeks ?
> Philippe



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: Scalability question

Posted by Teijo Holzer <th...@wetafx.co.nz>.

Hi,


> Unfortunately my data set really does grow because it s a time series. I'm
> going to add a trick to aggregate old data but it will still grow.

That's fine, then you need to scale horizontally. Simply add a new node when 
the load on a node exceeds a threshold (ballpark figure here is a maximum of 
100GB per node).

> How often do you repair per day (or is it really continuous ?)

Yes, we run repairs & compactions in a continuous loop. A full rolling loop now 
only takes ~3 hours (5 nodes with ~20GB each), so we are running 8 full repair 
loops per day. It used to take much longer when we only ran the repairs 
occasionally.

The repairs need to be performed anyway so might as well run them all the time. 
The compactions are really only there to keep the number of sstables low.

> I've been running experiments and I wonder if your decision to perform
> continuous repairs may not stem from what I observe : I emptied a keyspace and
> started loading data into it (about 18.000 mutations/s). Every time I run a
> repair on that keyspace I get out of sync ranges.
> I just don't see how that is possible given that
>   - none of the nodes are going down
>   - tpstats shows only occasional backlog on the nodes (up to 2000 pending max)
>
> Even weirder : when not writing to the keyspace, it took 4 consecutive repairs
> to not have any out of sync ranges anymore. Is repair probabilistic ?

Yes we observed this as well. This depends (amongst other things) on your RF, 
read/write consistency levels, read repair settings and the current flow of 
data at the time of repair.

The continuous repairs simply minimize the diffs.

Cheers,

	T.

> My CFs are created on the following template
>
> create column family PUBLIC_MONTHLY_20
>
> with column_type = Super
>
> with comparator = UTF8Type
>
> with subcomparator = BytesType
>
> and min_compaction_threshold=2 and read_repair_chance=0
>
> and keys_cached = 20
>
> and rows_cached = 50
>
> and default_validation_class = CounterColumnType and replicate_on_write=true;
>
>
> Philippe
>
> 2011/8/16 Teijo Holzer <tholzer@wetafx.co.nz <ma...@wetafx.co.nz>>
>
>     Hi,
>
>     we have come across this as well. We run continuously run rolling repairs
>     followed by major compactions followed by a gc() (or node restart) to get
>     rid of all these sstables files. Combined with aggressive ttls on most
>     inserts, the cluster stays nice and lean.
>
>     You don't want your working set to grow indefinitely.
>
>     Cheers,
>
>             T.
>
>
>
>     On 16/08/11 08:08, Philippe wrote:
>
>         Forgot to mention that stopping & restarting the server brought the data
>         directory down to 283GB in less than 1 minute.
>
>         Philippe
>         2011/8/15 Philippe <watcherfr@gmail.com <ma...@gmail.com>
>         <mailto:watcherfr@gmail.com <ma...@gmail.com>>>
>
>
>                 It's another reason to avoid major / manual compactions which
>         create a
>                 single big SSTable. Minor compactions keep things in buckets
>         which
>                 means newer SSTable can be compacted needing to read the bigger
>         older
>                 tables.
>
>             I've never run a major/manual compaction on this ring.
>             In my case running repair on a "big" keyspace results in SSTables
>         piling
>             up. My problematic node just filled up 483GB (yes, GB) of
>         SSTTables. Here
>             are the biggest
>             ls -laSrh
>             (...)
>
>             -rw-r--r-- 1 cassandra cassandra  2.7G 2011-08-15 14:13
>             PUBLIC_MONTHLY_20-g-4581-Data.__db
>
>             -rw-r--r-- 1 cassandra cassandra  2.7G 2011-08-15 14:52
>             PUBLIC_MONTHLY_20-g-4641-Data.__db
>
>             -rw-r--r-- 1 cassandra cassandra  2.8G 2011-08-15 14:39
>             PUBLIC_MONTHLY_20-tmp-g-4878-__Data.db
>
>             -rw-r--r-- 1 cassandra cassandra  2.9G 2011-08-15 15:00
>             PUBLIC_MONTHLY_20-g-4656-Data.__db
>
>             -rw-r--r-- 1 cassandra cassandra  3.0G 2011-08-15 14:17
>             PUBLIC_MONTHLY_20-g-4599-Data.__db
>
>             -rw-r--r-- 1 cassandra cassandra  3.0G 2011-08-15 15:11
>             PUBLIC_MONTHLY_20-g-4675-Data.__db
>
>             -rw-r--r-- 3 cassandra cassandra  3.1G 2011-08-13 10:34
>             PUBLIC_MONTHLY_18-g-3861-Data.__db
>
>             -rw-r--r-- 1 cassandra cassandra  3.2G 2011-08-15 14:41
>             PUBLIC_MONTHLY_20-tmp-g-4884-__Data.db
>
>             -rw-r--r-- 1 cassandra cassandra  3.6G 2011-08-15 14:44
>             PUBLIC_MONTHLY_20-tmp-g-4894-__Data.db
>
>             -rw-r--r-- 1 cassandra cassandra  3.8G 2011-08-15 14:56
>             PUBLIC_MONTHLY_20-tmp-g-4934-__Data.db
>
>             -rw-r--r-- 1 cassandra cassandra  3.8G 2011-08-15 14:46
>             PUBLIC_MONTHLY_20-tmp-g-4905-__Data.db
>
>             -rw-r--r-- 1 cassandra cassandra  4.0G 2011-08-15 14:57
>             PUBLIC_MONTHLY_20-tmp-g-4935-__Data.db
>
>             -rw-r--r-- 3 cassandra cassandra  5.9G 2011-08-13 12:53
>             PUBLIC_MONTHLY_19-g-4219-Data.__db
>
>             -rw-r--r-- 3 cassandra cassandra  6.0G 2011-08-13 13:57
>             PUBLIC_MONTHLY_20-g-4538-Data.__db
>
>             -rw-r--r-- 3 cassandra cassandra   12G 2011-08-13 09:27
>             PUBLIC_MONTHLY_20-g-4501-Data.__db
>
>
>             On the other nodes the same directory is around 69GB. Why are there so
>             fewer large files there and so many big ones on the repairing node ?
>               -rw-r--r-- 1 cassandra cassandra 434M 2011-08-15 16:02
>             PUBLIC_MONTHLY_17-g-3525-Data.__db
>             -rw-r--r-- 1 cassandra cassandra 456M 2011-08-15 15:50
>             PUBLIC_MONTHLY_19-g-4253-Data.__db
>             -rw-r--r-- 1 cassandra cassandra 485M 2011-08-15 14:30
>             PUBLIC_MONTHLY_20-g-5280-Data.__db
>             -rw-r--r-- 1 cassandra cassandra 572M 2011-08-15 15:15
>             PUBLIC_MONTHLY_18-g-3774-Data.__db
>             -rw-r--r-- 2 cassandra cassandra 664M 2011-08-09 15:39
>             PUBLIC_MONTHLY_20-g-4893-__Index.db
>             -rw-r--r-- 2 cassandra cassandra 811M 2011-08-11 21:27
>             PUBLIC_MONTHLY_16-g-2597-Data.__db
>             -rw-r--r-- 2 cassandra cassandra 915M 2011-08-13 04:00
>             PUBLIC_MONTHLY_18-g-3695-Data.__db
>             -rw-r--r-- 1 cassandra cassandra 925M 2011-08-15 03:39
>             PUBLIC_MONTHLY_17-g-3454-Data.__db
>             -rw-r--r-- 1 cassandra cassandra 1.3G 2011-08-15 13:46
>             PUBLIC_MONTHLY_19-g-4199-Data.__db
>             -rw-r--r-- 2 cassandra cassandra 1.5G 2011-08-10 15:37
>             PUBLIC_MONTHLY_17-g-3218-Data.__db
>             -rw-r--r-- 1 cassandra cassandra 1.9G 2011-08-15 14:35
>             PUBLIC_MONTHLY_20-g-5281-Data.__db
>             -rw-r--r-- 2 cassandra cassandra 2.1G 2011-08-10 16:33
>             PUBLIC_MONTHLY_19-g-3946-Data.__db
>             -rw-r--r-- 2 cassandra cassandra 3.1G 2011-08-10 22:23
>             PUBLIC_MONTHLY_18-g-3509-Data.__db
>             -rw-r--r-- 2 cassandra cassandra 4.0G 2011-08-10 18:18
>             PUBLIC_MONTHLY_20-g-5024-Data.__db
>             -rw------- 2 cassandra cassandra 5.1G 2011-08-09 15:23
>             PUBLIC_MONTHLY_19-g-3847-Data.__db
>             -rw-r--r-- 2 cassandra cassandra 9.6G 2011-08-09 15:39
>             PUBLIC_MONTHLY_20-g-4893-Data.__db
>
>             This whole compaction thing is getting me worried : how are sites in
>             production dealing with SSTables becoming larger and larger and
>         thus taking
>             longer and longer to compact ? Adding nodes every couple of weeks ?
>
>             Philippe
>
>
>
>

Re: Scalability question

Posted by Philippe <wa...@gmail.com>.

Teijo,

Unfortunately my data set really does grow because it s a time series. I'm
going to add a trick to aggregate old data but it will still grow.

How often do you repair per day (or is it really continuous ?)
I've been running experiments and I wonder if your decision to perform
continuous repairs may not stem from what I observe : I emptied a keyspace
and started loading data into it (about 18.000 mutations/s). Every time I
run a repair on that keyspace I get out of sync ranges.
I just don't see how that is possible given that
 - none of the nodes are going down
 - tpstats shows only occasional backlog on the nodes (up to 2000 pending
max)

Even weirder : when not writing to the keyspace, it took 4 consecutive
repairs to not have any out of sync ranges anymore. Is repair probabilistic
?

My CFs are created on the following template

create column family PUBLIC_MONTHLY_20

with column_type = Super

with comparator = UTF8Type

with subcomparator = BytesType

and min_compaction_threshold=2 and read_repair_chance=0

and keys_cached = 20

and rows_cached = 50

and default_validation_class = CounterColumnType and
replicate_on_write=true;

Philippe

2011/8/16 Teijo Holzer <th...@wetafx.co.nz>

> Hi,
>
> we have come across this as well. We run continuously run rolling repairs
> followed by major compactions followed by a gc() (or node restart) to get
> rid of all these sstables files. Combined with aggressive ttls on most
> inserts, the cluster stays nice and lean.
>
> You don't want your working set to grow indefinitely.
>
> Cheers,
>
>        T.
>
>
>
> On 16/08/11 08:08, Philippe wrote:
>
>> Forgot to mention that stopping & restarting the server brought the data
>> directory down to 283GB in less than 1 minute.
>>
>> Philippe
>> 2011/8/15 Philippe <watcherfr@gmail.com <ma...@gmail.com>>
>>
>>
>>        It's another reason to avoid major / manual compactions which
>> create a
>>        single big SSTable. Minor compactions keep things in buckets
>> which
>>        means newer SSTable can be compacted needing to read the bigger
>> older
>>        tables.
>>
>>    I've never run a major/manual compaction on this ring.
>>    In my case running repair on a "big" keyspace results in SSTables
>> piling
>>    up. My problematic node just filled up 483GB (yes, GB) of SSTTables.
>> Here
>>    are the biggest
>>    ls -laSrh
>>    (...)
>>
>>    -rw-r--r-- 1 cassandra cassandra  2.7G 2011-08-15 14:13
>>    PUBLIC_MONTHLY_20-g-4581-Data.**db
>>
>>    -rw-r--r-- 1 cassandra cassandra  2.7G 2011-08-15 14:52
>>    PUBLIC_MONTHLY_20-g-4641-Data.**db
>>
>>    -rw-r--r-- 1 cassandra cassandra  2.8G 2011-08-15 14:39
>>    PUBLIC_MONTHLY_20-tmp-g-4878-**Data.db
>>
>>    -rw-r--r-- 1 cassandra cassandra  2.9G 2011-08-15 15:00
>>    PUBLIC_MONTHLY_20-g-4656-Data.**db
>>
>>    -rw-r--r-- 1 cassandra cassandra  3.0G 2011-08-15 14:17
>>    PUBLIC_MONTHLY_20-g-4599-Data.**db
>>
>>    -rw-r--r-- 1 cassandra cassandra  3.0G 2011-08-15 15:11
>>    PUBLIC_MONTHLY_20-g-4675-Data.**db
>>
>>    -rw-r--r-- 3 cassandra cassandra  3.1G 2011-08-13 10:34
>>    PUBLIC_MONTHLY_18-g-3861-Data.**db
>>
>>    -rw-r--r-- 1 cassandra cassandra  3.2G 2011-08-15 14:41
>>    PUBLIC_MONTHLY_20-tmp-g-4884-**Data.db
>>
>>    -rw-r--r-- 1 cassandra cassandra  3.6G 2011-08-15 14:44
>>    PUBLIC_MONTHLY_20-tmp-g-4894-**Data.db
>>
>>    -rw-r--r-- 1 cassandra cassandra  3.8G 2011-08-15 14:56
>>    PUBLIC_MONTHLY_20-tmp-g-4934-**Data.db
>>
>>    -rw-r--r-- 1 cassandra cassandra  3.8G 2011-08-15 14:46
>>    PUBLIC_MONTHLY_20-tmp-g-4905-**Data.db
>>
>>    -rw-r--r-- 1 cassandra cassandra  4.0G 2011-08-15 14:57
>>    PUBLIC_MONTHLY_20-tmp-g-4935-**Data.db
>>
>>    -rw-r--r-- 3 cassandra cassandra  5.9G 2011-08-13 12:53
>>    PUBLIC_MONTHLY_19-g-4219-Data.**db
>>
>>    -rw-r--r-- 3 cassandra cassandra  6.0G 2011-08-13 13:57
>>    PUBLIC_MONTHLY_20-g-4538-Data.**db
>>
>>    -rw-r--r-- 3 cassandra cassandra   12G 2011-08-13 09:27
>>    PUBLIC_MONTHLY_20-g-4501-Data.**db
>>
>>
>>    On the other nodes the same directory is around 69GB. Why are there so
>>    fewer large files there and so many big ones on the repairing node ?
>>      -rw-r--r-- 1 cassandra cassandra 434M 2011-08-15 16:02
>>    PUBLIC_MONTHLY_17-g-3525-Data.**db
>>    -rw-r--r-- 1 cassandra cassandra 456M 2011-08-15 15:50
>>    PUBLIC_MONTHLY_19-g-4253-Data.**db
>>    -rw-r--r-- 1 cassandra cassandra 485M 2011-08-15 14:30
>>    PUBLIC_MONTHLY_20-g-5280-Data.**db
>>    -rw-r--r-- 1 cassandra cassandra 572M 2011-08-15 15:15
>>    PUBLIC_MONTHLY_18-g-3774-Data.**db
>>    -rw-r--r-- 2 cassandra cassandra 664M 2011-08-09 15:39
>>    PUBLIC_MONTHLY_20-g-4893-**Index.db
>>    -rw-r--r-- 2 cassandra cassandra 811M 2011-08-11 21:27
>>    PUBLIC_MONTHLY_16-g-2597-Data.**db
>>    -rw-r--r-- 2 cassandra cassandra 915M 2011-08-13 04:00
>>    PUBLIC_MONTHLY_18-g-3695-Data.**db
>>    -rw-r--r-- 1 cassandra cassandra 925M 2011-08-15 03:39
>>    PUBLIC_MONTHLY_17-g-3454-Data.**db
>>    -rw-r--r-- 1 cassandra cassandra 1.3G 2011-08-15 13:46
>>    PUBLIC_MONTHLY_19-g-4199-Data.**db
>>    -rw-r--r-- 2 cassandra cassandra 1.5G 2011-08-10 15:37
>>    PUBLIC_MONTHLY_17-g-3218-Data.**db
>>    -rw-r--r-- 1 cassandra cassandra 1.9G 2011-08-15 14:35
>>    PUBLIC_MONTHLY_20-g-5281-Data.**db
>>    -rw-r--r-- 2 cassandra cassandra 2.1G 2011-08-10 16:33
>>    PUBLIC_MONTHLY_19-g-3946-Data.**db
>>    -rw-r--r-- 2 cassandra cassandra 3.1G 2011-08-10 22:23
>>    PUBLIC_MONTHLY_18-g-3509-Data.**db
>>    -rw-r--r-- 2 cassandra cassandra 4.0G 2011-08-10 18:18
>>    PUBLIC_MONTHLY_20-g-5024-Data.**db
>>    -rw------- 2 cassandra cassandra 5.1G 2011-08-09 15:23
>>    PUBLIC_MONTHLY_19-g-3847-Data.**db
>>    -rw-r--r-- 2 cassandra cassandra 9.6G 2011-08-09 15:39
>>    PUBLIC_MONTHLY_20-g-4893-Data.**db
>>
>>    This whole compaction thing is getting me worried : how are sites in
>>    production dealing with SSTables becoming larger and larger and thus
>> taking
>>    longer and longer to compact ? Adding nodes every couple of weeks ?
>>
>>    Philippe
>>
>>
>>
>

Re: Scalability question

Posted by Teijo Holzer <th...@wetafx.co.nz>.

Hi,

we have come across this as well. We run continuously run rolling repairs 
followed by major compactions followed by a gc() (or node restart) to get rid 
of all these sstables files. Combined with aggressive ttls on most inserts, the 
cluster stays nice and lean.

You don't want your working set to grow indefinitely.

Cheers,

	T.


On 16/08/11 08:08, Philippe wrote:
> Forgot to mention that stopping & restarting the server brought the data
> directory down to 283GB in less than 1 minute.
>
> Philippe
> 2011/8/15 Philippe <watcherfr@gmail.com <ma...@gmail.com>>
>
>         It's another reason to avoid major / manual compactions which create a
>         single big SSTable. Minor compactions keep things in buckets   which
>         means newer SSTable can be compacted needing to read the bigger older
>         tables.
>
>     I've never run a major/manual compaction on this ring.
>     In my case running repair on a "big" keyspace results in SSTables piling
>     up. My problematic node just filled up 483GB (yes, GB) of SSTTables. Here
>     are the biggest
>     ls -laSrh
>     (...)
>
>     -rw-r--r-- 1 cassandra cassandra  2.7G 2011-08-15 14:13
>     PUBLIC_MONTHLY_20-g-4581-Data.db
>
>     -rw-r--r-- 1 cassandra cassandra  2.7G 2011-08-15 14:52
>     PUBLIC_MONTHLY_20-g-4641-Data.db
>
>     -rw-r--r-- 1 cassandra cassandra  2.8G 2011-08-15 14:39
>     PUBLIC_MONTHLY_20-tmp-g-4878-Data.db
>
>     -rw-r--r-- 1 cassandra cassandra  2.9G 2011-08-15 15:00
>     PUBLIC_MONTHLY_20-g-4656-Data.db
>
>     -rw-r--r-- 1 cassandra cassandra  3.0G 2011-08-15 14:17
>     PUBLIC_MONTHLY_20-g-4599-Data.db
>
>     -rw-r--r-- 1 cassandra cassandra  3.0G 2011-08-15 15:11
>     PUBLIC_MONTHLY_20-g-4675-Data.db
>
>     -rw-r--r-- 3 cassandra cassandra  3.1G 2011-08-13 10:34
>     PUBLIC_MONTHLY_18-g-3861-Data.db
>
>     -rw-r--r-- 1 cassandra cassandra  3.2G 2011-08-15 14:41
>     PUBLIC_MONTHLY_20-tmp-g-4884-Data.db
>
>     -rw-r--r-- 1 cassandra cassandra  3.6G 2011-08-15 14:44
>     PUBLIC_MONTHLY_20-tmp-g-4894-Data.db
>
>     -rw-r--r-- 1 cassandra cassandra  3.8G 2011-08-15 14:56
>     PUBLIC_MONTHLY_20-tmp-g-4934-Data.db
>
>     -rw-r--r-- 1 cassandra cassandra  3.8G 2011-08-15 14:46
>     PUBLIC_MONTHLY_20-tmp-g-4905-Data.db
>
>     -rw-r--r-- 1 cassandra cassandra  4.0G 2011-08-15 14:57
>     PUBLIC_MONTHLY_20-tmp-g-4935-Data.db
>
>     -rw-r--r-- 3 cassandra cassandra  5.9G 2011-08-13 12:53
>     PUBLIC_MONTHLY_19-g-4219-Data.db
>
>     -rw-r--r-- 3 cassandra cassandra  6.0G 2011-08-13 13:57
>     PUBLIC_MONTHLY_20-g-4538-Data.db
>
>     -rw-r--r-- 3 cassandra cassandra   12G 2011-08-13 09:27
>     PUBLIC_MONTHLY_20-g-4501-Data.db
>
>
>     On the other nodes the same directory is around 69GB. Why are there so
>     fewer large files there and so many big ones on the repairing node ?
>       -rw-r--r-- 1 cassandra cassandra 434M 2011-08-15 16:02
>     PUBLIC_MONTHLY_17-g-3525-Data.db
>     -rw-r--r-- 1 cassandra cassandra 456M 2011-08-15 15:50
>     PUBLIC_MONTHLY_19-g-4253-Data.db
>     -rw-r--r-- 1 cassandra cassandra 485M 2011-08-15 14:30
>     PUBLIC_MONTHLY_20-g-5280-Data.db
>     -rw-r--r-- 1 cassandra cassandra 572M 2011-08-15 15:15
>     PUBLIC_MONTHLY_18-g-3774-Data.db
>     -rw-r--r-- 2 cassandra cassandra 664M 2011-08-09 15:39
>     PUBLIC_MONTHLY_20-g-4893-Index.db
>     -rw-r--r-- 2 cassandra cassandra 811M 2011-08-11 21:27
>     PUBLIC_MONTHLY_16-g-2597-Data.db
>     -rw-r--r-- 2 cassandra cassandra 915M 2011-08-13 04:00
>     PUBLIC_MONTHLY_18-g-3695-Data.db
>     -rw-r--r-- 1 cassandra cassandra 925M 2011-08-15 03:39
>     PUBLIC_MONTHLY_17-g-3454-Data.db
>     -rw-r--r-- 1 cassandra cassandra 1.3G 2011-08-15 13:46
>     PUBLIC_MONTHLY_19-g-4199-Data.db
>     -rw-r--r-- 2 cassandra cassandra 1.5G 2011-08-10 15:37
>     PUBLIC_MONTHLY_17-g-3218-Data.db
>     -rw-r--r-- 1 cassandra cassandra 1.9G 2011-08-15 14:35
>     PUBLIC_MONTHLY_20-g-5281-Data.db
>     -rw-r--r-- 2 cassandra cassandra 2.1G 2011-08-10 16:33
>     PUBLIC_MONTHLY_19-g-3946-Data.db
>     -rw-r--r-- 2 cassandra cassandra 3.1G 2011-08-10 22:23
>     PUBLIC_MONTHLY_18-g-3509-Data.db
>     -rw-r--r-- 2 cassandra cassandra 4.0G 2011-08-10 18:18
>     PUBLIC_MONTHLY_20-g-5024-Data.db
>     -rw------- 2 cassandra cassandra 5.1G 2011-08-09 15:23
>     PUBLIC_MONTHLY_19-g-3847-Data.db
>     -rw-r--r-- 2 cassandra cassandra 9.6G 2011-08-09 15:39
>     PUBLIC_MONTHLY_20-g-4893-Data.db
>
>     This whole compaction thing is getting me worried : how are sites in
>     production dealing with SSTables becoming larger and larger and thus taking
>     longer and longer to compact ? Adding nodes every couple of weeks ?
>
>     Philippe
>
>

Re: Scalability question

Posted by Philippe <wa...@gmail.com>.

Forgot to mention that stopping & restarting the server brought the data
directory down to 283GB in less than 1 minute.

Philippe
2011/8/15 Philippe <wa...@gmail.com>

> It's another reason to avoid major / manual compactions which create a
>> single big SSTable. Minor compactions keep things in buckets   which means
>> newer SSTable can be compacted needing to read the bigger older tables.
>>
> I've never run a major/manual compaction on this ring.
> In my case running repair on a "big" keyspace results in SSTables piling
> up. My problematic node just filled up 483GB (yes, GB) of SSTTables. Here
> are the biggest
> ls -laSrh
> (...)
>
> -rw-r--r-- 1 cassandra cassandra  2.7G 2011-08-15 14:13
> PUBLIC_MONTHLY_20-g-4581-Data.db
>
> -rw-r--r-- 1 cassandra cassandra  2.7G 2011-08-15 14:52
> PUBLIC_MONTHLY_20-g-4641-Data.db
>
> -rw-r--r-- 1 cassandra cassandra  2.8G 2011-08-15 14:39
> PUBLIC_MONTHLY_20-tmp-g-4878-Data.db
>
> -rw-r--r-- 1 cassandra cassandra  2.9G 2011-08-15 15:00
> PUBLIC_MONTHLY_20-g-4656-Data.db
>
> -rw-r--r-- 1 cassandra cassandra  3.0G 2011-08-15 14:17
> PUBLIC_MONTHLY_20-g-4599-Data.db
>
> -rw-r--r-- 1 cassandra cassandra  3.0G 2011-08-15 15:11
> PUBLIC_MONTHLY_20-g-4675-Data.db
>
> -rw-r--r-- 3 cassandra cassandra  3.1G 2011-08-13 10:34
> PUBLIC_MONTHLY_18-g-3861-Data.db
>
> -rw-r--r-- 1 cassandra cassandra  3.2G 2011-08-15 14:41
> PUBLIC_MONTHLY_20-tmp-g-4884-Data.db
>
> -rw-r--r-- 1 cassandra cassandra  3.6G 2011-08-15 14:44
> PUBLIC_MONTHLY_20-tmp-g-4894-Data.db
>
> -rw-r--r-- 1 cassandra cassandra  3.8G 2011-08-15 14:56
> PUBLIC_MONTHLY_20-tmp-g-4934-Data.db
>
> -rw-r--r-- 1 cassandra cassandra  3.8G 2011-08-15 14:46
> PUBLIC_MONTHLY_20-tmp-g-4905-Data.db
>
> -rw-r--r-- 1 cassandra cassandra  4.0G 2011-08-15 14:57
> PUBLIC_MONTHLY_20-tmp-g-4935-Data.db
>
> -rw-r--r-- 3 cassandra cassandra  5.9G 2011-08-13 12:53
> PUBLIC_MONTHLY_19-g-4219-Data.db
>
> -rw-r--r-- 3 cassandra cassandra  6.0G 2011-08-13 13:57
> PUBLIC_MONTHLY_20-g-4538-Data.db
>
> -rw-r--r-- 3 cassandra cassandra   12G 2011-08-13 09:27
> PUBLIC_MONTHLY_20-g-4501-Data.db
>
> On the other nodes the same directory is around 69GB. Why are there so
> fewer large files there and so many big ones on the repairing node ?
>  -rw-r--r-- 1 cassandra cassandra 434M 2011-08-15 16:02
> PUBLIC_MONTHLY_17-g-3525-Data.db
> -rw-r--r-- 1 cassandra cassandra 456M 2011-08-15 15:50
> PUBLIC_MONTHLY_19-g-4253-Data.db
> -rw-r--r-- 1 cassandra cassandra 485M 2011-08-15 14:30
> PUBLIC_MONTHLY_20-g-5280-Data.db
> -rw-r--r-- 1 cassandra cassandra 572M 2011-08-15 15:15
> PUBLIC_MONTHLY_18-g-3774-Data.db
> -rw-r--r-- 2 cassandra cassandra 664M 2011-08-09 15:39
> PUBLIC_MONTHLY_20-g-4893-Index.db
> -rw-r--r-- 2 cassandra cassandra 811M 2011-08-11 21:27
> PUBLIC_MONTHLY_16-g-2597-Data.db
> -rw-r--r-- 2 cassandra cassandra 915M 2011-08-13 04:00
> PUBLIC_MONTHLY_18-g-3695-Data.db
> -rw-r--r-- 1 cassandra cassandra 925M 2011-08-15 03:39
> PUBLIC_MONTHLY_17-g-3454-Data.db
> -rw-r--r-- 1 cassandra cassandra 1.3G 2011-08-15 13:46
> PUBLIC_MONTHLY_19-g-4199-Data.db
> -rw-r--r-- 2 cassandra cassandra 1.5G 2011-08-10 15:37
> PUBLIC_MONTHLY_17-g-3218-Data.db
> -rw-r--r-- 1 cassandra cassandra 1.9G 2011-08-15 14:35
> PUBLIC_MONTHLY_20-g-5281-Data.db
> -rw-r--r-- 2 cassandra cassandra 2.1G 2011-08-10 16:33
> PUBLIC_MONTHLY_19-g-3946-Data.db
> -rw-r--r-- 2 cassandra cassandra 3.1G 2011-08-10 22:23
> PUBLIC_MONTHLY_18-g-3509-Data.db
> -rw-r--r-- 2 cassandra cassandra 4.0G 2011-08-10 18:18
> PUBLIC_MONTHLY_20-g-5024-Data.db
> -rw------- 2 cassandra cassandra 5.1G 2011-08-09 15:23
> PUBLIC_MONTHLY_19-g-3847-Data.db
> -rw-r--r-- 2 cassandra cassandra 9.6G 2011-08-09 15:39
> PUBLIC_MONTHLY_20-g-4893-Data.db
>
> This whole compaction thing is getting me worried : how are sites in
> production dealing with SSTables becoming larger and larger and thus taking
> longer and longer to compact ? Adding nodes every couple of weeks ?
>
> Philippe
>

Re: Scalability question

Posted by Philippe <wa...@gmail.com>.

>
> It's another reason to avoid major / manual compactions which create a
> single big SSTable. Minor compactions keep things in buckets   which means
> newer SSTable can be compacted needing to read the bigger older tables.
>
I've never run a major/manual compaction on this ring.
In my case running repair on a "big" keyspace results in SSTables piling up.
My problematic node just filled up 483GB (yes, GB) of SSTTables. Here are
the biggest
ls -laSrh
(...)

-rw-r--r-- 1 cassandra cassandra  2.7G 2011-08-15 14:13
PUBLIC_MONTHLY_20-g-4581-Data.db

-rw-r--r-- 1 cassandra cassandra  2.7G 2011-08-15 14:52
PUBLIC_MONTHLY_20-g-4641-Data.db

-rw-r--r-- 1 cassandra cassandra  2.8G 2011-08-15 14:39
PUBLIC_MONTHLY_20-tmp-g-4878-Data.db

-rw-r--r-- 1 cassandra cassandra  2.9G 2011-08-15 15:00
PUBLIC_MONTHLY_20-g-4656-Data.db

-rw-r--r-- 1 cassandra cassandra  3.0G 2011-08-15 14:17
PUBLIC_MONTHLY_20-g-4599-Data.db

-rw-r--r-- 1 cassandra cassandra  3.0G 2011-08-15 15:11
PUBLIC_MONTHLY_20-g-4675-Data.db

-rw-r--r-- 3 cassandra cassandra  3.1G 2011-08-13 10:34
PUBLIC_MONTHLY_18-g-3861-Data.db

-rw-r--r-- 1 cassandra cassandra  3.2G 2011-08-15 14:41
PUBLIC_MONTHLY_20-tmp-g-4884-Data.db

-rw-r--r-- 1 cassandra cassandra  3.6G 2011-08-15 14:44
PUBLIC_MONTHLY_20-tmp-g-4894-Data.db

-rw-r--r-- 1 cassandra cassandra  3.8G 2011-08-15 14:56
PUBLIC_MONTHLY_20-tmp-g-4934-Data.db

-rw-r--r-- 1 cassandra cassandra  3.8G 2011-08-15 14:46
PUBLIC_MONTHLY_20-tmp-g-4905-Data.db

-rw-r--r-- 1 cassandra cassandra  4.0G 2011-08-15 14:57
PUBLIC_MONTHLY_20-tmp-g-4935-Data.db

-rw-r--r-- 3 cassandra cassandra  5.9G 2011-08-13 12:53
PUBLIC_MONTHLY_19-g-4219-Data.db

-rw-r--r-- 3 cassandra cassandra  6.0G 2011-08-13 13:57
PUBLIC_MONTHLY_20-g-4538-Data.db

-rw-r--r-- 3 cassandra cassandra   12G 2011-08-13 09:27
PUBLIC_MONTHLY_20-g-4501-Data.db

On the other nodes the same directory is around 69GB. Why are there so fewer
large files there and so many big ones on the repairing node ?
 -rw-r--r-- 1 cassandra cassandra 434M 2011-08-15 16:02
PUBLIC_MONTHLY_17-g-3525-Data.db
-rw-r--r-- 1 cassandra cassandra 456M 2011-08-15 15:50
PUBLIC_MONTHLY_19-g-4253-Data.db
-rw-r--r-- 1 cassandra cassandra 485M 2011-08-15 14:30
PUBLIC_MONTHLY_20-g-5280-Data.db
-rw-r--r-- 1 cassandra cassandra 572M 2011-08-15 15:15
PUBLIC_MONTHLY_18-g-3774-Data.db
-rw-r--r-- 2 cassandra cassandra 664M 2011-08-09 15:39
PUBLIC_MONTHLY_20-g-4893-Index.db
-rw-r--r-- 2 cassandra cassandra 811M 2011-08-11 21:27
PUBLIC_MONTHLY_16-g-2597-Data.db
-rw-r--r-- 2 cassandra cassandra 915M 2011-08-13 04:00
PUBLIC_MONTHLY_18-g-3695-Data.db
-rw-r--r-- 1 cassandra cassandra 925M 2011-08-15 03:39
PUBLIC_MONTHLY_17-g-3454-Data.db
-rw-r--r-- 1 cassandra cassandra 1.3G 2011-08-15 13:46
PUBLIC_MONTHLY_19-g-4199-Data.db
-rw-r--r-- 2 cassandra cassandra 1.5G 2011-08-10 15:37
PUBLIC_MONTHLY_17-g-3218-Data.db
-rw-r--r-- 1 cassandra cassandra 1.9G 2011-08-15 14:35
PUBLIC_MONTHLY_20-g-5281-Data.db
-rw-r--r-- 2 cassandra cassandra 2.1G 2011-08-10 16:33
PUBLIC_MONTHLY_19-g-3946-Data.db
-rw-r--r-- 2 cassandra cassandra 3.1G 2011-08-10 22:23
PUBLIC_MONTHLY_18-g-3509-Data.db
-rw-r--r-- 2 cassandra cassandra 4.0G 2011-08-10 18:18
PUBLIC_MONTHLY_20-g-5024-Data.db
-rw------- 2 cassandra cassandra 5.1G 2011-08-09 15:23
PUBLIC_MONTHLY_19-g-3847-Data.db
-rw-r--r-- 2 cassandra cassandra 9.6G 2011-08-09 15:39
PUBLIC_MONTHLY_20-g-4893-Data.db

This whole compaction thing is getting me worried : how are sites in
production dealing with SSTables becoming larger and larger and thus taking
longer and longer to compact ? Adding nodes every couple of weeks ?

Philippe

Re: Scalability question

Posted by aaron morton <aa...@thelastpickle.com>.

Multi threaded compaction helps there https://issues.apache.org/jira/browse/CASSANDRA-2191

It's another reason to avoid major / manual compactions which create a single big SSTable. Minor compactions keep things in buckets   which means newer SSTable can be compacted needing to read the bigger older tables. 

It's also a reasonable factor to consider when sizing nodes. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 15/08/2011, at 5:32 AM, Philippe wrote:

> Hi,
> 
> As on-disk SSTables become bigger and bigger because more data is added in the ring, compactions take longer and longer because each file is becoming bigger.
> Isn't there a time where compacting will take so long that compaction just can't keep up with the amount of data ? It looks to me like that's independent from the write throughput, just a question of how long it takes.
> 
> What am I missing ?
> 
> Thanks
> Philippe