You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Aiman Parvaiz <ai...@steelhouse.com> on 2018/04/20 02:08:10 UTC

Large size KS management

Hi all

I have been given a 15 nodes C* 2.2.8 cluster to manage which has a large size KS (~800GB). Given the size of the KS most of the management tasks like repair take a long time to complete and disk space management is becoming tricky from the systems perspective.


This KS size is going to go up in future and we have a business requirement of long data retention here. I wanted to share this with all of you and ask what are my options here, what would be the best way to deal with a large size KS like this one. To make situation even trickier low IO latency is expected from this cluster as well.


Thankful for any suggestions/advice in advance.

Re: Large size KS management

Posted by Anup Shirolkar <an...@instaclustr.com>.

Hi Aiman,

Can you please clarify whether the mentioned size of 800GB is considering
Replication Factor(RF) or without it ? If yes, what is the RF ?

Also, what is the method used to measure keyspace data size e.g size of
directory, nodetool command etc.

It would be helpful to know about the cluster node configurations and
topology used.

On basis of information we have, the size 800GB for 15 nodes gives us
53.33GB of data per node which is quite normal for a Cassandra cluster.

A question about growth of data, what is the estimated rate at which the
data will grow ?

If you can clarify these queries, it will be easy to talk about specific
areas of solution.

Thanks,
Anup

On 20 April 2018 at 12:08, Aiman Parvaiz <ai...@steelhouse.com> wrote:

> Hi all
>
> I have been given a 15 nodes C* 2.2.8 cluster to manage which has a large
> size KS (~800GB). Given the size of the KS most of the management tasks
> like repair take a long time to complete and disk space management is
> becoming tricky from the systems perspective.
>
>
> This KS size is going to go up in future and we have a business
> requirement of long data retention here. I wanted to share this with all of
> you and ask what are my options here, what would be the best way to deal
> with a large size KS like this one. To make situation even trickier low IO
> latency is expected from this cluster as well.
>
>
> Thankful for any suggestions/advice in advance.
>
>
>
>

Re: Large size KS management

Posted by Oleksandr Shulgin <ol...@zalando.de>.

On Fri, Apr 20, 2018 at 4:08 AM, Aiman Parvaiz <ai...@steelhouse.com> wrote:

> Hi all
>
> I have been given a 15 nodes C* 2.2.8 cluster to manage which has a large
> size KS (~800GB).
>

Is this per node or in total?

> Given the size of the KS most of the management tasks like repair take a
> long time to complete and disk space management is becoming tricky from the
> systems perspective.
>

Please quantify "long".  We had a 12 nodes 2.1 cluster with ~60 TB total
and one repair of the full ring (using cassandra-reaper) was taking about
3-4 weeks(!).  We were using default number of vnodes, 256.
Now we have migrated to 30 nodes on Cassandra version 3.0, with only 16
vnodes and the same full repair is under 5 days.

--
Alex