You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by "Isaac Reath (BLOOMBERG/ 919 3RD A)" <ir...@bloomberg.net> on 2020/07/10 21:18:50 UTC

Running Large Clusters in Production

Hi All,

I’m currently dealing with a use case that is running on around 200 nodes, due to growth of their product as well as onboarding additional data sources, we are looking at having to expand that to around 700 nodes, and potentially beyond to 1000+. To that end I have a couple of questions:

1)  For those who have experienced managing clusters at that scale, what types of operational challenges have you run into that you might not see when operating 100 node clusters? A couple that come to mind are version (especially major version) upgrades become a lot more risky as it no longer becomes feasible to do a blue / green style deployment of the database and backup & restore operations seem far more error prone as well for the same reason (having to do an in-place restore instead of being able to spin up a new cluster to restore to). 

2) Is there a cluster size beyond which sharding across multiple clusters becomes the recommended approach?

Thanks,
Isaac


Re: Running Large Clusters in Production

Posted by onmstester onmstester <on...@zoho.com.INVALID>.
Yes, you should handle the routing logic at app level

I wish there was another level of sharding (above dc, rack) as cluster to distribute data on multiple cluster! but i don't think there is any other database that does such a thing for you.

Another problem with big cluster is for huge amount of threads on each node which is : (CLUSTER_SIZE - 1) * (3 INCOMING Threads + 3 OUTGOING), even for 100 nodes it would be 600 threads, i wonder how some papers reported linear scalability for Cassandra even with >300nodes (such as Netflix at 2011), i mean shouldn't the overhead of increasing number of threads on each node to slow down the linear scalability?

Sent using https://www.zoho.com/mail/




---- On Sat, 11 Jul 2020 06:18:33 +0430 Sergio <la...@gmail.com> wrote ----


Sorry for the dumb question:

When we refer to 1000 nodes divided in 10 clusters(shards): we would have 100 nodes per cluster

A shard is not intended as Datacenter but it would be a cluster itself that it doesn't talk with the other ones so there should be some routing logic at the application level to route the requests to the correct cluster?

Is this the recommended approach?



Thanks 







On Fri, Jul 10, 2020, 4:06 PM Jon Haddad <ma...@jonhaddad.com> wrote:





I worked on a handful of large clusters (> 200 nodes) using vnodes, and there were some serious issues with both performance and availability.  We had to put in a LOT of work to fix the problems.

I agree with Jeff - it's way better to manage multiple clusters than a really large one.





On Fri, Jul 10, 2020 at 2:49 PM Jeff Jirsa <ma...@gmail.com> wrote:

1000 instances are fine if you're not using vnodes.

I'm not sure what the limit is if you're using vnodes. 


If you might get to 1000, shard early before you get there. Running 8x100 host clusters will be easier than one 800 host cluster.




On Fri, Jul 10, 2020 at 2:19 PM Isaac Reath (BLOOMBERG/ 919 3RD A) <ma...@bloomberg.net> wrote:

Hi All,

I’m currently dealing with a use case that is running on around 200 nodes, due to growth of their product as well as onboarding additional data sources, we are looking at having to expand that to around 700 nodes, and potentially beyond to 1000+. To that end I have a couple of questions:



1)  For those who have experienced managing clusters at that scale, what types of operational challenges have you run into that you might not see when operating 100 node clusters? A couple that come to mind are version (especially major version) upgrades become a lot more risky as it no longer becomes feasible to do a blue / green style deployment of the database and backup & restore operations seem far more error prone as well for the same reason (having to do an in-place restore instead of being able to spin up a new cluster to restore to).



2) Is there a cluster size beyond which sharding across multiple clusters becomes the recommended approach?



Thanks,

Isaac

Re: Running Large Clusters in Production

Posted by Sergio <la...@gmail.com>.
Sorry for the dumb question:

When we refer to 1000 nodes divided in 10 clusters(shards): we would have
100 nodes per cluster
A shard is not intended as Datacenter but it would be a cluster itself that
it doesn't talk with the other ones so there should be some routing logic
at the application level to route the requests to the correct cluster?
Is this the recommended approach?

Thanks



On Fri, Jul 10, 2020, 4:06 PM Jon Haddad <jo...@jonhaddad.com> wrote:

> I worked on a handful of large clusters (> 200 nodes) using vnodes, and
> there were some serious issues with both performance and availability.  We
> had to put in a LOT of work to fix the problems.
>
> I agree with Jeff - it's way better to manage multiple clusters than a
> really large one.
>
>
> On Fri, Jul 10, 2020 at 2:49 PM Jeff Jirsa <jj...@gmail.com> wrote:
>
>> 1000 instances are fine if you're not using vnodes.
>>
>> I'm not sure what the limit is if you're using vnodes.
>>
>> If you might get to 1000, shard early before you get there. Running 8x100
>> host clusters will be easier than one 800 host cluster.
>>
>>
>> On Fri, Jul 10, 2020 at 2:19 PM Isaac Reath (BLOOMBERG/ 919 3RD A) <
>> ireath@bloomberg.net> wrote:
>>
>>> Hi All,
>>>
>>> I’m currently dealing with a use case that is running on around 200
>>> nodes, due to growth of their product as well as onboarding additional data
>>> sources, we are looking at having to expand that to around 700 nodes, and
>>> potentially beyond to 1000+. To that end I have a couple of questions:
>>>
>>> 1) For those who have experienced managing clusters at that scale, what
>>> types of operational challenges have you run into that you might not see
>>> when operating 100 node clusters? A couple that come to mind are version
>>> (especially major version) upgrades become a lot more risky as it no longer
>>> becomes feasible to do a blue / green style deployment of the database and
>>> backup & restore operations seem far more error prone as well for the same
>>> reason (having to do an in-place restore instead of being able to spin up a
>>> new cluster to restore to).
>>>
>>> 2) Is there a cluster size beyond which sharding across multiple
>>> clusters becomes the recommended approach?
>>>
>>> Thanks,
>>> Isaac
>>>
>>>

Re: Running Large Clusters in Production

Posted by Jon Haddad <jo...@jonhaddad.com>.
I worked on a handful of large clusters (> 200 nodes) using vnodes, and
there were some serious issues with both performance and availability.  We
had to put in a LOT of work to fix the problems.

I agree with Jeff - it's way better to manage multiple clusters than a
really large one.


On Fri, Jul 10, 2020 at 2:49 PM Jeff Jirsa <jj...@gmail.com> wrote:

> 1000 instances are fine if you're not using vnodes.
>
> I'm not sure what the limit is if you're using vnodes.
>
> If you might get to 1000, shard early before you get there. Running 8x100
> host clusters will be easier than one 800 host cluster.
>
>
> On Fri, Jul 10, 2020 at 2:19 PM Isaac Reath (BLOOMBERG/ 919 3RD A) <
> ireath@bloomberg.net> wrote:
>
>> Hi All,
>>
>> I’m currently dealing with a use case that is running on around 200
>> nodes, due to growth of their product as well as onboarding additional data
>> sources, we are looking at having to expand that to around 700 nodes, and
>> potentially beyond to 1000+. To that end I have a couple of questions:
>>
>> 1) For those who have experienced managing clusters at that scale, what
>> types of operational challenges have you run into that you might not see
>> when operating 100 node clusters? A couple that come to mind are version
>> (especially major version) upgrades become a lot more risky as it no longer
>> becomes feasible to do a blue / green style deployment of the database and
>> backup & restore operations seem far more error prone as well for the same
>> reason (having to do an in-place restore instead of being able to spin up a
>> new cluster to restore to).
>>
>> 2) Is there a cluster size beyond which sharding across multiple clusters
>> becomes the recommended approach?
>>
>> Thanks,
>> Isaac
>>
>>

Re: Running Large Clusters in Production

Posted by Jeff Jirsa <jj...@gmail.com>.
1000 instances are fine if you're not using vnodes.

I'm not sure what the limit is if you're using vnodes.

If you might get to 1000, shard early before you get there. Running 8x100
host clusters will be easier than one 800 host cluster.


On Fri, Jul 10, 2020 at 2:19 PM Isaac Reath (BLOOMBERG/ 919 3RD A) <
ireath@bloomberg.net> wrote:

> Hi All,
>
> I’m currently dealing with a use case that is running on around 200 nodes,
> due to growth of their product as well as onboarding additional data
> sources, we are looking at having to expand that to around 700 nodes, and
> potentially beyond to 1000+. To that end I have a couple of questions:
>
> 1) For those who have experienced managing clusters at that scale, what
> types of operational challenges have you run into that you might not see
> when operating 100 node clusters? A couple that come to mind are version
> (especially major version) upgrades become a lot more risky as it no longer
> becomes feasible to do a blue / green style deployment of the database and
> backup & restore operations seem far more error prone as well for the same
> reason (having to do an in-place restore instead of being able to spin up a
> new cluster to restore to).
>
> 2) Is there a cluster size beyond which sharding across multiple clusters
> becomes the recommended approach?
>
> Thanks,
> Isaac
>
>