You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Aiman Parvaiz <ai...@flipagram.com> on 2016/04/14 10:57:05 UTC

Cassandra 2.1.12 Node size

Hi all,
I am running a 9 node C* 2.1.12 cluster. I seek advice in data size per
node. Each of my node has close to 1 TB of data. I am not seeing any issues
as of now but wanted to run it by you guys if this data size is pushing the
limits in any manner and if I should be working on reducing data size per
node. I will me migrating to incremental repairs shortly and full repair as
of now takes 20 hr/node. I am not seeing any issues with the nodes for now.

Thanks

Re: Cassandra 2.1.12 Node size

Posted by Aiman Parvaiz <ai...@flipagram.com>.

Right now the biggest SST which I have is 210GB on a 3 TB disk, total disk
consumed is around 50% on all nodes, I am using SCTS. Read and Write query
latency is under 15ms. Full repair time is long but am sure when I switch
to incremental repairs this would be taken care of. I am hitting the 50%
disk issue. I recently ran the cleanup and backups aren't taking that much
space.

On Thu, Apr 14, 2016 at 8:06 PM, Jack Krupansky <ja...@gmail.com>
wrote:

> The four criteria I would suggest for evaluating node size:
>
> 1. Query latency.
> 2. Query throughput/load
> 3. Repair time - worst case, full repair, what you can least afford if it
> happens at the worst time
> 4. Expected growth over the next six to 18 months - you don't what to be
> scrambling with latency, throughput, and repair problems when you bump into
> a wall on capacity. 20% to 30% is a fair number.
>
> Alas, it is very difficult to determine how much spare capacity you have,
> other than an artificial, synthetic load test: Try 30% more clients and
> queries with 30% more (synthetic) data and see what happens to query
> latency, total throughput, and repair time. Run such a test periodically
> (monthly) to get a heads-up when load is getting closer to a wall.
>
> Incremental repair is great to streamline and optimize your day-to-day
> operations, but focus attention on replacement of down nodes during times
> of stress.
>
>
>
> -- Jack Krupansky
>
> On Thu, Apr 14, 2016 at 10:14 AM, Alain RODRIGUEZ <ar...@gmail.com>
> wrote:
>
>> Would adding nodes be the right way to start if I want to get the data
>>> per node down
>>
>>
>> Yes, if everything else is fine, the last and always available option to
>> reduce the disk size per node is to add new nodes. Sometimes it is the
>> first option considered as it is relatively quick and quite strait forward.
>>
>> Again, 50 % of free disk space is not a hard limit. To give you a rough
>> idea, if the biggest sstable is 100 GB big and you still have 400 GB free,
>> you will probably be good to go, excepted if 4 compaction of 100 GB trigger
>> at the same time, filling up the disk.
>>
>> Now is the good time to think of a plan to handle the growth for you, but
>> don't worry if data reaches 60%, it will probably not be a big deal.
>>
>> You can make sure that:
>>
>> - There are no snapshots, heap dumps or data not related with C* taking
>> some space
>> - The biggest sstables tombstone ratio are not too high (are tombstones
>> are correctly evicted ?)
>> - You are using compression (if you want too)
>>
>> Consider:
>>
>> - Adding TTLs to data you don't want to keep forever, shorten TTLs as
>> much as allowed.
>> - Migrating to C*3.0+ and take advantage of the new engine storage
>>
>> C*heers,
>> -----------------------
>> Alain Rodriguez - alain@thelastpickle.com
>> France
>>
>> The Last Pickle - Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>>
>> 2016-04-14 15:41 GMT+02:00 Aiman Parvaiz <ai...@flipagram.com>:
>>
>>> Thanks for the response Alain. I am using STCS and would like to take
>>> some action as we would be hitting 50% disk space pretty soon. Would adding
>>> nodes be the right way to start if I want to get the data per node down
>>> otherwise can you or someone on the list please suggest the right way to go
>>> about it.
>>>
>>> Thanks
>>>
>>> Sent from my iPhone
>>>
>>> On Apr 14, 2016, at 5:17 PM, Alain RODRIGUEZ <ar...@gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> I seek advice in data size per node. Each of my node has close to 1 TB
>>>> of data. I am not seeing any issues as of now but wanted to run it by you
>>>> guys if this data size is pushing the limits in any manner and if I should
>>>> be working on reducing data size per node.
>>>
>>>
>>> There is no real limit to the data size other than 50% of the machine
>>> disk space using STCS and 80 % if you are using LCS. Those are 'soft'
>>> limits as it will depend on your biggest sstables size and the number of
>>> concurrent compactions mainly, but to stay away from trouble, it is better
>>> to keep things under control, below the limits mentioned above.
>>>
>>> I will me migrating to incremental repairs shortly and full repair as of
>>>> now takes 20 hr/node. I am not seeing any issues with the nodes for now.
>>>>
>>>
>>> As you noticed, you need to keep in mind that the larger the dataset is,
>>> the longer operations will take. Repairs but also bootstrap or replace
>>> a node, remove a node, any operation that require to stream data or read
>>> it. Repair time can be mitigated by using incremental repairs indeed.
>>>
>>> I am running a 9 node C* 2.1.12 cluster.
>>>>
>>>
>>> It should be quite safe to give incremental repair a try as many bugs
>>> have been fixe in this version:
>>>
>>> FIX 2.1.12 - A lot of sstables using range repairs due to anticompaction
>>> - incremental only
>>>
>>> https://issues.apache.org/jira/browse/CASSANDRA-10422
>>>
>>> FIX 2.1.12 - repair hang when replica is down - incremental only
>>>
>>> https://issues.apache.org/jira/browse/CASSANDRA-10288
>>>
>>> If you are using DTCS be aware of
>>> https://issues.apache.org/jira/browse/CASSANDRA-11113
>>>
>>> If using LCS, watch closely sstable and compactions pending counts.
>>>
>>> As a general comment, I would say that Cassandra has evolved to be able
>>> to handle huge datasets (memory structures off-heap + increase of heap size
>>> using G1GC, JBOD, vnodes, ...). Today Cassandra works just fine with big
>>> dataset. I have seen clusters with 4+ TB nodes and other using a few GB per
>>> node. It all depends on your requirements and your machines spec. If fast
>>> operations are absolutely necessary, keep it small. If you want to use the
>>> entire disk space (50/80% of total disk space max), go ahead as long as
>>> other resources are fine (CPU, memory, disk throughput, ...).
>>>
>>> C*heers,
>>>
>>> -----------------------
>>> Alain Rodriguez - alain@thelastpickle.com
>>> France
>>>
>>> The Last Pickle - Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>>
>>> 2016-04-14 10:57 GMT+02:00 Aiman Parvaiz <ai...@flipagram.com>:
>>>
>>>> Hi all,
>>>> I am running a 9 node C* 2.1.12 cluster. I seek advice in data size per
>>>> node. Each of my node has close to 1 TB of data. I am not seeing any issues
>>>> as of now but wanted to run it by you guys if this data size is pushing the
>>>> limits in any manner and if I should be working on reducing data size per
>>>> node. I will me migrating to incremental repairs shortly and full repair as
>>>> of now takes 20 hr/node. I am not seeing any issues with the nodes for now.
>>>>
>>>> Thanks
>>>>
>>>>
>>>>
>>>>
>>>
>>
>


-- 
*Aiman Parvaiz*
Lead Systems Architect
aiman@flipagram.com
cell: 213-300-6377
http://flipagram.com/apz

Re: Cassandra 2.1.12 Node size

Posted by Jack Krupansky <ja...@gmail.com>.

The four criteria I would suggest for evaluating node size:

1. Query latency.
2. Query throughput/load
3. Repair time - worst case, full repair, what you can least afford if it
happens at the worst time
4. Expected growth over the next six to 18 months - you don't what to be
scrambling with latency, throughput, and repair problems when you bump into
a wall on capacity. 20% to 30% is a fair number.

Alas, it is very difficult to determine how much spare capacity you have,
other than an artificial, synthetic load test: Try 30% more clients and
queries with 30% more (synthetic) data and see what happens to query
latency, total throughput, and repair time. Run such a test periodically
(monthly) to get a heads-up when load is getting closer to a wall.

Incremental repair is great to streamline and optimize your day-to-day
operations, but focus attention on replacement of down nodes during times
of stress.



-- Jack Krupansky

On Thu, Apr 14, 2016 at 10:14 AM, Alain RODRIGUEZ <ar...@gmail.com>
wrote:

> Would adding nodes be the right way to start if I want to get the data per
>> node down
>
>
> Yes, if everything else is fine, the last and always available option to
> reduce the disk size per node is to add new nodes. Sometimes it is the
> first option considered as it is relatively quick and quite strait forward.
>
> Again, 50 % of free disk space is not a hard limit. To give you a rough
> idea, if the biggest sstable is 100 GB big and you still have 400 GB free,
> you will probably be good to go, excepted if 4 compaction of 100 GB trigger
> at the same time, filling up the disk.
>
> Now is the good time to think of a plan to handle the growth for you, but
> don't worry if data reaches 60%, it will probably not be a big deal.
>
> You can make sure that:
>
> - There are no snapshots, heap dumps or data not related with C* taking
> some space
> - The biggest sstables tombstone ratio are not too high (are tombstones
> are correctly evicted ?)
> - You are using compression (if you want too)
>
> Consider:
>
> - Adding TTLs to data you don't want to keep forever, shorten TTLs as much
> as allowed.
> - Migrating to C*3.0+ and take advantage of the new engine storage
>
> C*heers,
> -----------------------
> Alain Rodriguez - alain@thelastpickle.com
> France
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
>
> 2016-04-14 15:41 GMT+02:00 Aiman Parvaiz <ai...@flipagram.com>:
>
>> Thanks for the response Alain. I am using STCS and would like to take
>> some action as we would be hitting 50% disk space pretty soon. Would adding
>> nodes be the right way to start if I want to get the data per node down
>> otherwise can you or someone on the list please suggest the right way to go
>> about it.
>>
>> Thanks
>>
>> Sent from my iPhone
>>
>> On Apr 14, 2016, at 5:17 PM, Alain RODRIGUEZ <ar...@gmail.com> wrote:
>>
>> Hi,
>>
>> I seek advice in data size per node. Each of my node has close to 1 TB of
>>> data. I am not seeing any issues as of now but wanted to run it by you guys
>>> if this data size is pushing the limits in any manner and if I should be
>>> working on reducing data size per node.
>>
>>
>> There is no real limit to the data size other than 50% of the machine
>> disk space using STCS and 80 % if you are using LCS. Those are 'soft'
>> limits as it will depend on your biggest sstables size and the number of
>> concurrent compactions mainly, but to stay away from trouble, it is better
>> to keep things under control, below the limits mentioned above.
>>
>> I will me migrating to incremental repairs shortly and full repair as of
>>> now takes 20 hr/node. I am not seeing any issues with the nodes for now.
>>>
>>
>> As you noticed, you need to keep in mind that the larger the dataset is,
>> the longer operations will take. Repairs but also bootstrap or replace a
>> node, remove a node, any operation that require to stream data or read it.
>> Repair time can be mitigated by using incremental repairs indeed.
>>
>> I am running a 9 node C* 2.1.12 cluster.
>>>
>>
>> It should be quite safe to give incremental repair a try as many bugs
>> have been fixe in this version:
>>
>> FIX 2.1.12 - A lot of sstables using range repairs due to anticompaction
>> - incremental only
>>
>> https://issues.apache.org/jira/browse/CASSANDRA-10422
>>
>> FIX 2.1.12 - repair hang when replica is down - incremental only
>>
>> https://issues.apache.org/jira/browse/CASSANDRA-10288
>>
>> If you are using DTCS be aware of
>> https://issues.apache.org/jira/browse/CASSANDRA-11113
>>
>> If using LCS, watch closely sstable and compactions pending counts.
>>
>> As a general comment, I would say that Cassandra has evolved to be able
>> to handle huge datasets (memory structures off-heap + increase of heap size
>> using G1GC, JBOD, vnodes, ...). Today Cassandra works just fine with big
>> dataset. I have seen clusters with 4+ TB nodes and other using a few GB per
>> node. It all depends on your requirements and your machines spec. If fast
>> operations are absolutely necessary, keep it small. If you want to use the
>> entire disk space (50/80% of total disk space max), go ahead as long as
>> other resources are fine (CPU, memory, disk throughput, ...).
>>
>> C*heers,
>>
>> -----------------------
>> Alain Rodriguez - alain@thelastpickle.com
>> France
>>
>> The Last Pickle - Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>> 2016-04-14 10:57 GMT+02:00 Aiman Parvaiz <ai...@flipagram.com>:
>>
>>> Hi all,
>>> I am running a 9 node C* 2.1.12 cluster. I seek advice in data size per
>>> node. Each of my node has close to 1 TB of data. I am not seeing any issues
>>> as of now but wanted to run it by you guys if this data size is pushing the
>>> limits in any manner and if I should be working on reducing data size per
>>> node. I will me migrating to incremental repairs shortly and full repair as
>>> of now takes 20 hr/node. I am not seeing any issues with the nodes for now.
>>>
>>> Thanks
>>>
>>>
>>>
>>>
>>
>

Re: Cassandra 2.1.12 Node size

Posted by Alain RODRIGUEZ <ar...@gmail.com>.

>
> Would adding nodes be the right way to start if I want to get the data per
> node down


Yes, if everything else is fine, the last and always available option to
reduce the disk size per node is to add new nodes. Sometimes it is the
first option considered as it is relatively quick and quite strait forward.

Again, 50 % of free disk space is not a hard limit. To give you a rough
idea, if the biggest sstable is 100 GB big and you still have 400 GB free,
you will probably be good to go, excepted if 4 compaction of 100 GB trigger
at the same time, filling up the disk.

Now is the good time to think of a plan to handle the growth for you, but
don't worry if data reaches 60%, it will probably not be a big deal.

You can make sure that:

- There are no snapshots, heap dumps or data not related with C* taking
some space
- The biggest sstables tombstone ratio are not too high (are tombstones are
correctly evicted ?)
- You are using compression (if you want too)

Consider:

- Adding TTLs to data you don't want to keep forever, shorten TTLs as much
as allowed.
- Migrating to C*3.0+ and take advantage of the new engine storage

C*heers,
-----------------------
Alain Rodriguez - alain@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com


2016-04-14 15:41 GMT+02:00 Aiman Parvaiz <ai...@flipagram.com>:

> Thanks for the response Alain. I am using STCS and would like to take some
> action as we would be hitting 50% disk space pretty soon. Would adding
> nodes be the right way to start if I want to get the data per node down
> otherwise can you or someone on the list please suggest the right way to go
> about it.
>
> Thanks
>
> Sent from my iPhone
>
> On Apr 14, 2016, at 5:17 PM, Alain RODRIGUEZ <ar...@gmail.com> wrote:
>
> Hi,
>
> I seek advice in data size per node. Each of my node has close to 1 TB of
>> data. I am not seeing any issues as of now but wanted to run it by you guys
>> if this data size is pushing the limits in any manner and if I should be
>> working on reducing data size per node.
>
>
> There is no real limit to the data size other than 50% of the machine disk
> space using STCS and 80 % if you are using LCS. Those are 'soft' limits as
> it will depend on your biggest sstables size and the number of concurrent
> compactions mainly, but to stay away from trouble, it is better to keep
> things under control, below the limits mentioned above.
>
> I will me migrating to incremental repairs shortly and full repair as of
>> now takes 20 hr/node. I am not seeing any issues with the nodes for now.
>>
>
> As you noticed, you need to keep in mind that the larger the dataset is,
> the longer operations will take. Repairs but also bootstrap or replace a
> node, remove a node, any operation that require to stream data or read it.
> Repair time can be mitigated by using incremental repairs indeed.
>
> I am running a 9 node C* 2.1.12 cluster.
>>
>
> It should be quite safe to give incremental repair a try as many bugs have
> been fixe in this version:
>
> FIX 2.1.12 - A lot of sstables using range repairs due to anticompaction
> - incremental only
>
> https://issues.apache.org/jira/browse/CASSANDRA-10422
>
> FIX 2.1.12 - repair hang when replica is down - incremental only
>
> https://issues.apache.org/jira/browse/CASSANDRA-10288
>
> If you are using DTCS be aware of
> https://issues.apache.org/jira/browse/CASSANDRA-11113
>
> If using LCS, watch closely sstable and compactions pending counts.
>
> As a general comment, I would say that Cassandra has evolved to be able to
> handle huge datasets (memory structures off-heap + increase of heap size
> using G1GC, JBOD, vnodes, ...). Today Cassandra works just fine with big
> dataset. I have seen clusters with 4+ TB nodes and other using a few GB per
> node. It all depends on your requirements and your machines spec. If fast
> operations are absolutely necessary, keep it small. If you want to use the
> entire disk space (50/80% of total disk space max), go ahead as long as
> other resources are fine (CPU, memory, disk throughput, ...).
>
> C*heers,
>
> -----------------------
> Alain Rodriguez - alain@thelastpickle.com
> France
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> 2016-04-14 10:57 GMT+02:00 Aiman Parvaiz <ai...@flipagram.com>:
>
>> Hi all,
>> I am running a 9 node C* 2.1.12 cluster. I seek advice in data size per
>> node. Each of my node has close to 1 TB of data. I am not seeing any issues
>> as of now but wanted to run it by you guys if this data size is pushing the
>> limits in any manner and if I should be working on reducing data size per
>> node. I will me migrating to incremental repairs shortly and full repair as
>> of now takes 20 hr/node. I am not seeing any issues with the nodes for now.
>>
>> Thanks
>>
>>
>>
>>
>

Re: Cassandra 2.1.12 Node size

Posted by Aiman Parvaiz <ai...@flipagram.com>.

Thanks for the response Alain. I am using STCS and would like to take some action as we would be hitting 50% disk space pretty soon. Would adding nodes be the right way to start if I want to get the data per node down otherwise can you or someone on the list please suggest the right way to go about it.

Thanks

Sent from my iPhone

> On Apr 14, 2016, at 5:17 PM, Alain RODRIGUEZ <ar...@gmail.com> wrote:
> 
> Hi,
> 
>> I seek advice in data size per node. Each of my node has close to 1 TB of data. I am not seeing any issues as of now but wanted to run it by you guys if this data size is pushing the limits in any manner and if I should be working on reducing data size per node.
> 
> There is no real limit to the data size other than 50% of the machine disk space using STCS and 80 % if you are using LCS. Those are 'soft' limits as it will depend on your biggest sstables size and the number of concurrent compactions mainly, but to stay away from trouble, it is better to keep things under control, below the limits mentioned above.
> 
>> I will me migrating to incremental repairs shortly and full repair as of now takes 20 hr/node. I am not seeing any issues with the nodes for now.
> 
> As you noticed, you need to keep in mind that the larger the dataset is, the longer operations will take. Repairs but also bootstrap or replace a node, remove a node, any operation that require to stream data or read it. Repair time can be mitigated by using incremental repairs indeed. 
> 
>> I am running a 9 node C* 2.1.12 cluster.
> 
> It should be quite safe to give incremental repair a try as many bugs have been fixe in this version:
> 
> FIX 2.1.12 - A lot of sstables using range repairs due to anticompaction - incremental only
> 
> https://issues.apache.org/jira/browse/CASSANDRA-10422
> 
> FIX 2.1.12 - repair hang when replica is down - incremental only
> 
> https://issues.apache.org/jira/browse/CASSANDRA-10288
> 
> If you are using DTCS be aware of https://issues.apache.org/jira/browse/CASSANDRA-11113
> 
> If using LCS, watch closely sstable and compactions pending counts.
> 
> As a general comment, I would say that Cassandra has evolved to be able to handle huge datasets (memory structures off-heap + increase of heap size using G1GC, JBOD, vnodes, ...). Today Cassandra works just fine with big dataset. I have seen clusters with 4+ TB nodes and other using a few GB per node. It all depends on your requirements and your machines spec. If fast operations are absolutely necessary, keep it small. If you want to use the entire disk space (50/80% of total disk space max), go ahead as long as other resources are fine (CPU, memory, disk throughput, ...).
> 
> C*heers,
> 
> -----------------------
> Alain Rodriguez - alain@thelastpickle.com
> France
> 
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
> 
> 
> 2016-04-14 10:57 GMT+02:00 Aiman Parvaiz <ai...@flipagram.com>:
>> Hi all,
>> I am running a 9 node C* 2.1.12 cluster. I seek advice in data size per node. Each of my node has close to 1 TB of data. I am not seeing any issues as of now but wanted to run it by you guys if this data size is pushing the limits in any manner and if I should be working on reducing data size per node. I will me migrating to incremental repairs shortly and full repair as of now takes 20 hr/node. I am not seeing any issues with the nodes for now.
>> 
>> Thanks
>

Re: Cassandra 2.1.12 Node size

Posted by Alain RODRIGUEZ <ar...@gmail.com>.

Hi,

I seek advice in data size per node. Each of my node has close to 1 TB of
> data. I am not seeing any issues as of now but wanted to run it by you guys
> if this data size is pushing the limits in any manner and if I should be
> working on reducing data size per node.


There is no real limit to the data size other than 50% of the machine disk
space using STCS and 80 % if you are using LCS. Those are 'soft' limits as
it will depend on your biggest sstables size and the number of concurrent
compactions mainly, but to stay away from trouble, it is better to keep
things under control, below the limits mentioned above.

I will me migrating to incremental repairs shortly and full repair as of
> now takes 20 hr/node. I am not seeing any issues with the nodes for now.
>

As you noticed, you need to keep in mind that the larger the dataset is,
the longer operations will take. Repairs but also bootstrap or replace a
node, remove a node, any operation that require to stream data or read it.
Repair time can be mitigated by using incremental repairs indeed.

I am running a 9 node C* 2.1.12 cluster.
>

It should be quite safe to give incremental repair a try as many bugs have
been fixe in this version:

FIX 2.1.12 - A lot of sstables using range repairs due to anticompaction
- incremental only

https://issues.apache.org/jira/browse/CASSANDRA-10422

FIX 2.1.12 - repair hang when replica is down - incremental only

https://issues.apache.org/jira/browse/CASSANDRA-10288

If you are using DTCS be aware of
https://issues.apache.org/jira/browse/CASSANDRA-11113

If using LCS, watch closely sstable and compactions pending counts.

As a general comment, I would say that Cassandra has evolved to be able to
handle huge datasets (memory structures off-heap + increase of heap size
using G1GC, JBOD, vnodes, ...). Today Cassandra works just fine with big
dataset. I have seen clusters with 4+ TB nodes and other using a few GB per
node. It all depends on your requirements and your machines spec. If fast
operations are absolutely necessary, keep it small. If you want to use the
entire disk space (50/80% of total disk space max), go ahead as long as
other resources are fine (CPU, memory, disk throughput, ...).

C*heers,

-----------------------
Alain Rodriguez - alain@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-04-14 10:57 GMT+02:00 Aiman Parvaiz <ai...@flipagram.com>:

> Hi all,
> I am running a 9 node C* 2.1.12 cluster. I seek advice in data size per
> node. Each of my node has close to 1 TB of data. I am not seeing any issues
> as of now but wanted to run it by you guys if this data size is pushing the
> limits in any manner and if I should be working on reducing data size per
> node. I will me migrating to incremental repairs shortly and full repair as
> of now takes 20 hr/node. I am not seeing any issues with the nodes for now.
>
> Thanks
>
>
>
>