You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Yan Chunlu <sp...@gmail.com> on 2011/07/08 17:50:42 UTC

how large cassandra could scale when it need to do manual operation?

hi, all:
I am curious about how large that Cassandra can scale?

from the information I can get, the largest usage is at facebook, which is
about 150 nodes.  in the mean time they are using 2000+ nodes with Hadoop,
and yahoo even using 4000 nodes of Hadoop.

I am not understand why is the situation, I only have  little knowledge with
Cassandra and even no knowledge with Hadoop.



currently I am using cassandra with 3 nodes and having problem bring one
back after it out of sync, the problems I encountered making me worry about
how cassandra could scale out:

1):  the load balance need to manually performed on every node, according
to:

def tokens(nodes):

for x in xrange(nodes):

print 2 ** 127 / nodes * x



2): when adding new nodes, need to perform node repair and cleanup on every
node



3) when decommission a node, there is a chance that slow down the entire
cluster. (not sure why but I saw people ask around about it.) and the only
way to do is shutdown the entire the cluster, rsync the data, and start all
nodes without the decommission one.





after all, I think there is alot of human work to do to maintain the cluster
which make it impossible to scale to thousands of nodes, but I hope I am
totally wrong about all of this, currently I am serving 1 millions pv every
day with Cassandra and it make me feel unsafe, I am afraid one day one node
crash will cause the data broken and all cluster goes wrong....



in the contrary, relational database make me feel safety but it does not
scale well.



thanks for any guidance here.

Re: how large cassandra could scale when it need to do manual operation?

Posted by Yan Chunlu <sp...@gmail.com>.

thanks for the information Chris.

that's very much like what I am going to do, though not as many nodes as
yours.
do you place the nodes in the same datancter?
could you give more information about the latency between your datacenters?
and also the replica_placement_strategy, do you use the
"cassandra-topology.properties" file to maintain the node list? thanks!

maybe I worried too much about the disaster-tolerant things...


On Sat, Jul 9, 2011 at 5:01 PM, Chris Goffinet <cg...@chrisgoffinet.com> wrote:

> As mentioned by Aaron, yes we run hundreds of Cassandra nodes across
> multiple clusters. We run with RF of 2 and 3 (most common).
>
> We use commodity hardware and see failure all the time at this scale. We've
> never had 3 nodes that were in same replica set, fail all at once. We
> mitigate risk by being rack diverse, using different vendors for our hard
> drives, designed workflows to make sure machines get serviced in certain
> time windows and have an extensive automated burn-in process of (disk,
> memory, drives) to not roll out nodes/clusters that could fail right away.
>
>
> On Sat, Jul 9, 2011 at 12:17 AM, Yan Chunlu <sp...@gmail.com> wrote:
>
>> thank you very much for the reply. which brings me more confidence on
>> cassandra.
>> I will try the automation tools, the examples you've listed seems quite
>> promising!
>>
>>
>> about the decommission problem, here is the link:
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-decommission-two-slow-nodes-td5078455.html
>>  I am also trying to deploy cassandra across two datacenters(with 20ms
>> latency). so I am worrying about the network latency will even make it
>> worse.
>>
>> maybe I was misunderstanding the replication factor, doesn't it RF=3 means
>> I could lose two nodes and still have one available(with 100% of the keys),
>> once Nodes>=3?   besides I am not sure what's twitters setting on RF, but it
>> is possible to lose 3 nodes in the same time(facebook once encountered photo
>> loss because there RAID broken, rarely happen though). I have the strong
>> willing to set RF to a very high value...
>>
>> Thanks!
>>
>>
>> On Sat, Jul 9, 2011 at 5:22 AM, aaron morton <aa...@thelastpickle.com>wrote:
>>
>>> AFAIK Facebook Cassandra and Apache Cassandra diverged paths a long time
>>> ago. Twitter is a vocal supporter with a large Apache Cassandra install,
>>> e.g. "Twitter currently runs a couple hundred Cassandra nodes across a half
>>> dozen clusters. "
>>> http://www.datastax.com/2011/06/chris-goffinet-of-twitter-to-speak-at-cassandra-sf-2011
>>>
>>>
>>>
>>> <http://www.datastax.com/2011/06/chris-goffinet-of-twitter-to-speak-at-cassandra-sf-2011>If
>>> you are working with a 3 node cluster removing/rebuilding/what ever one node
>>> will effect 33% of your capacity. When you scale up the contribution from
>>> each individual node goes down, and the impact of one node going down is
>>> less. Problems that happen with a few nodes will go away at scale, to be
>>> replaced by a whole set of new ones.
>>>
>>>
>>> 1):  the load balance need to manually performed on every node, according
>>> to:
>>>
>>> Yes
>>>
>>> 2): when adding new nodes, need to perform node repair and cleanup on
>>> every node
>>>
>>> You only need to run cleanup, see
>>> http://wiki.apache.org/cassandra/Operations#Bootstrap
>>>
>>> 3) when decommission a node, there is a chance that slow down the entire
>>> cluster. (not sure why but I saw people ask around about it.) and the only
>>> way to do is shutdown the entire the cluster, rsync the data, and start all
>>> nodes without the decommission one.
>>>
>>> I cannot remember any specific cases where decommission requires a full
>>> cluster stop, do you have a link? With regard to slowing down, the
>>> decommission process will stream data from the node you are removing onto
>>> the other nodes this can slow down the target node (I think it's more
>>> intelligent now about what is moved). This will be exaggerated in a 3 node
>>> cluster as you are removing 33% of the processing and adding some
>>> (temporary) extra load to the remaining nodes.
>>>
>>> after all, I think there is alot of human work to do to maintain the
>>> cluster which make it impossible to scale to thousands of nodes,
>>>
>>> Automation, Automation, Automation is the only way to go.
>>>
>>> Chef, Puppet, CF Engine for general config and deployment; Cloud Kick,
>>> munin, ganglia etc for monitoring. And
>>> Ops Centre (http://www.datastax.com/products/opscenter) for cassandra
>>> specific management.
>>>
>>> I am totally wrong about all of this, currently I am serving 1 millions
>>> pv every day with Cassandra and it make me feel unsafe, I am afraid one day
>>> one node crash will cause the data broken and all cluster goes wrong....
>>>
>>> With RF3 and a 3Node cluster you have room to lose one node and the
>>> cluster will be up for 100% of the keys. While better than having to worry
>>> about *the* database server, it's still entry level fault tolerance. With RF
>>> 3 in a 6 Node cluster you can lose up to 2 nodes and still be up for 100% of
>>> the keys.
>>>
>>> Is there something you are specifically concerned about with your current
>>> installation ?
>>>
>>> Cheers
>>>
>>>   -----------------
>>> Aaron Morton
>>> Freelance Cassandra Developer
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>>
>>> On 8 Jul 2011, at 08:50, Yan Chunlu wrote:
>>>
>>> hi, all:
>>> I am curious about how large that Cassandra can scale?
>>>
>>> from the information I can get, the largest usage is at facebook, which
>>> is about 150 nodes.  in the mean time they are using 2000+ nodes with
>>> Hadoop, and yahoo even using 4000 nodes of Hadoop.
>>>
>>> I am not understand why is the situation, I only have  little knowledge
>>> with Cassandra and even no knowledge with Hadoop.
>>>
>>>
>>>
>>> currently I am using cassandra with 3 nodes and having problem bring one
>>> back after it out of sync, the problems I encountered making me worry about
>>> how cassandra could scale out:
>>>
>>> 1):  the load balance need to manually performed on every node, according
>>> to:
>>>
>>> def tokens(nodes):
>>>
>>> for x in xrange(nodes):
>>>
>>> print 2 ** 127 / nodes * x
>>>
>>>
>>>
>>> 2): when adding new nodes, need to perform node repair and cleanup on
>>> every node
>>>
>>>
>>>
>>> 3) when decommission a node, there is a chance that slow down the entire
>>> cluster. (not sure why but I saw people ask around about it.) and the only
>>> way to do is shutdown the entire the cluster, rsync the data, and start all
>>> nodes without the decommission one.
>>>
>>>
>>>
>>>
>>>
>>> after all, I think there is alot of human work to do to maintain the
>>> cluster which make it impossible to scale to thousands of nodes, but I hope
>>> I am totally wrong about all of this, currently I am serving 1 millions pv
>>> every day with Cassandra and it make me feel unsafe, I am afraid one day one
>>> node crash will cause the data broken and all cluster goes wrong....
>>>
>>>
>>>
>>> in the contrary, relational database make me feel safety but it does not
>>> scale well.
>>>
>>>
>>>
>>> thanks for any guidance here.
>>>
>>>
>>>
>>
>>
>> --
>> Charles
>>
>
>


-- 
Charles

Re: how large cassandra could scale when it need to do manual operation?

Posted by Yan Chunlu <sp...@gmail.com>.

I missed the consistency level part, thanks very much for the explanation.
that is clear enough.

On Sun, Jul 10, 2011 at 7:57 AM, aaron morton <aa...@thelastpickle.com>wrote:

> about the decommission problem, here is the link:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-decommission-two-slow-nodes-td5078455.html
>
> The key part of that post is "and since the second node was under heavy
> load, and not enough ram, it was busy GCing and worked horribly slow" .
>
> maybe I was misunderstanding the replication factor, doesn't it RF=3 means
> I could lose two nodes and still have one available(with 100% of the keys),
> once Nodes>=3?
>
> When you start losing replicas the CL you use dictates if the cluster is
> still up for 100% of the keys. See
> http://thelastpickle.com/2011/06/13/Down-For-Me/
>
>  I have the strong willing to set RF to a very high value...
>
> As chris said 3 is about normal, it means the QUORUM CL is only 2 nodes.
>
> I am also trying to deploy cassandra across two datacenters(with 20ms
>> latency).
>>
> Lookup LOCAL_QUORUM in the wiki
>
> Hope that helps.
>
>  -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 9 Jul 2011, at 02:01, Chris Goffinet wrote:
>
> As mentioned by Aaron, yes we run hundreds of Cassandra nodes across
> multiple clusters. We run with RF of 2 and 3 (most common).
>
> We use commodity hardware and see failure all the time at this scale. We've
> never had 3 nodes that were in same replica set, fail all at once. We
> mitigate risk by being rack diverse, using different vendors for our hard
> drives, designed workflows to make sure machines get serviced in certain
> time windows and have an extensive automated burn-in process of (disk,
> memory, drives) to not roll out nodes/clusters that could fail right away.
>
> On Sat, Jul 9, 2011 at 12:17 AM, Yan Chunlu <sp...@gmail.com> wrote:
>
>> thank you very much for the reply. which brings me more confidence on
>> cassandra.
>> I will try the automation tools, the examples you've listed seems quite
>> promising!
>>
>>
>> about the decommission problem, here is the link:
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-decommission-two-slow-nodes-td5078455.html
>>  I am also trying to deploy cassandra across two datacenters(with 20ms
>> latency). so I am worrying about the network latency will even make it
>> worse.
>>
>> maybe I was misunderstanding the replication factor, doesn't it RF=3 means
>> I could lose two nodes and still have one available(with 100% of the keys),
>> once Nodes>=3?   besides I am not sure what's twitters setting on RF, but it
>> is possible to lose 3 nodes in the same time(facebook once encountered photo
>> loss because there RAID broken, rarely happen though). I have the strong
>> willing to set RF to a very high value...
>>
>> Thanks!
>>
>>
>> On Sat, Jul 9, 2011 at 5:22 AM, aaron morton <aa...@thelastpickle.com>wrote:
>>
>>> AFAIK Facebook Cassandra and Apache Cassandra diverged paths a long time
>>> ago. Twitter is a vocal supporter with a large Apache Cassandra install,
>>> e.g. "Twitter currently runs a couple hundred Cassandra nodes across a half
>>> dozen clusters. "
>>> http://www.datastax.com/2011/06/chris-goffinet-of-twitter-to-speak-at-cassandra-sf-2011
>>>
>>>
>>>
>>> <http://www.datastax.com/2011/06/chris-goffinet-of-twitter-to-speak-at-cassandra-sf-2011>If
>>> you are working with a 3 node cluster removing/rebuilding/what ever one node
>>> will effect 33% of your capacity. When you scale up the contribution from
>>> each individual node goes down, and the impact of one node going down is
>>> less. Problems that happen with a few nodes will go away at scale, to be
>>> replaced by a whole set of new ones.
>>>
>>>
>>> 1):  the load balance need to manually performed on every node, according
>>> to:
>>>
>>> Yes
>>>
>>> 2): when adding new nodes, need to perform node repair and cleanup on
>>> every node
>>>
>>> You only need to run cleanup, see
>>> http://wiki.apache.org/cassandra/Operations#Bootstrap
>>>
>>> 3) when decommission a node, there is a chance that slow down the entire
>>> cluster. (not sure why but I saw people ask around about it.) and the only
>>> way to do is shutdown the entire the cluster, rsync the data, and start all
>>> nodes without the decommission one.
>>>
>>> I cannot remember any specific cases where decommission requires a full
>>> cluster stop, do you have a link? With regard to slowing down, the
>>> decommission process will stream data from the node you are removing onto
>>> the other nodes this can slow down the target node (I think it's more
>>> intelligent now about what is moved). This will be exaggerated in a 3 node
>>> cluster as you are removing 33% of the processing and adding some
>>> (temporary) extra load to the remaining nodes.
>>>
>>> after all, I think there is alot of human work to do to maintain the
>>> cluster which make it impossible to scale to thousands of nodes,
>>>
>>> Automation, Automation, Automation is the only way to go.
>>>
>>> Chef, Puppet, CF Engine for general config and deployment; Cloud Kick,
>>> munin, ganglia etc for monitoring. And
>>> Ops Centre (http://www.datastax.com/products/opscenter) for cassandra
>>> specific management.
>>>
>>> I am totally wrong about all of this, currently I am serving 1 millions
>>> pv every day with Cassandra and it make me feel unsafe, I am afraid one day
>>> one node crash will cause the data broken and all cluster goes wrong....
>>>
>>> With RF3 and a 3Node cluster you have room to lose one node and the
>>> cluster will be up for 100% of the keys. While better than having to worry
>>> about *the* database server, it's still entry level fault tolerance. With RF
>>> 3 in a 6 Node cluster you can lose up to 2 nodes and still be up for 100% of
>>> the keys.
>>>
>>> Is there something you are specifically concerned about with your current
>>> installation ?
>>>
>>> Cheers
>>>
>>>   -----------------
>>> Aaron Morton
>>> Freelance Cassandra Developer
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>>
>>> On 8 Jul 2011, at 08:50, Yan Chunlu wrote:
>>>
>>> hi, all:
>>> I am curious about how large that Cassandra can scale?
>>>
>>> from the information I can get, the largest usage is at facebook, which
>>> is about 150 nodes.  in the mean time they are using 2000+ nodes with
>>> Hadoop, and yahoo even using 4000 nodes of Hadoop.
>>>
>>> I am not understand why is the situation, I only have  little knowledge
>>> with Cassandra and even no knowledge with Hadoop.
>>>
>>>
>>>
>>> currently I am using cassandra with 3 nodes and having problem bring one
>>> back after it out of sync, the problems I encountered making me worry about
>>> how cassandra could scale out:
>>>
>>> 1):  the load balance need to manually performed on every node, according
>>> to:
>>>
>>> def tokens(nodes):
>>>
>>> for x in xrange(nodes):
>>>
>>> print 2 ** 127 / nodes * x
>>>
>>>
>>>
>>> 2): when adding new nodes, need to perform node repair and cleanup on
>>> every node
>>>
>>>
>>>
>>> 3) when decommission a node, there is a chance that slow down the entire
>>> cluster. (not sure why but I saw people ask around about it.) and the only
>>> way to do is shutdown the entire the cluster, rsync the data, and start all
>>> nodes without the decommission one.
>>>
>>>
>>>
>>>
>>>
>>> after all, I think there is alot of human work to do to maintain the
>>> cluster which make it impossible to scale to thousands of nodes, but I hope
>>> I am totally wrong about all of this, currently I am serving 1 millions pv
>>> every day with Cassandra and it make me feel unsafe, I am afraid one day one
>>> node crash will cause the data broken and all cluster goes wrong....
>>>
>>>
>>>
>>> in the contrary, relational database make me feel safety but it does not
>>> scale well.
>>>
>>>
>>>
>>> thanks for any guidance here.
>>>
>>>
>>>
>>
>>
>> --
>> Charles
>>
>
>
>


-- 
Charles

Re: how large cassandra could scale when it need to do manual operation?

Posted by aaron morton <aa...@thelastpickle.com>.

> about the decommission problem, here is the link:  http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-decommission-two-slow-nodes-td5078455.html
The key part of that post is "and since the second node was under heavy load, and not enough ram, it was busy GCing and worked horribly slow" . 

> maybe I was misunderstanding the replication factor, doesn't it RF=3 means I could lose two nodes and still have one available(with 100% of the keys), once Nodes>=3?
When you start losing replicas the CL you use dictates if the cluster is still up for 100% of the keys. See http://thelastpickle.com/2011/06/13/Down-For-Me/ 

>  I have the strong willing to set RF to a very high value...
As chris said 3 is about normal, it means the QUORUM CL is only 2 nodes. 

> I am also trying to deploy cassandra across two datacenters(with 20ms latency).

Lookup LOCAL_QUORUM in the wiki

Hope that helps. 

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 9 Jul 2011, at 02:01, Chris Goffinet wrote:

> As mentioned by Aaron, yes we run hundreds of Cassandra nodes across multiple clusters. We run with RF of 2 and 3 (most common). 
> 
> We use commodity hardware and see failure all the time at this scale. We've never had 3 nodes that were in same replica set, fail all at once. We mitigate risk by being rack diverse, using different vendors for our hard drives, designed workflows to make sure machines get serviced in certain time windows and have an extensive automated burn-in process of (disk, memory, drives) to not roll out nodes/clusters that could fail right away.
> 
> On Sat, Jul 9, 2011 at 12:17 AM, Yan Chunlu <sp...@gmail.com> wrote:
> thank you very much for the reply. which brings me more confidence on cassandra.
> I will try the automation tools, the examples you've listed seems quite promising!
> 
> 
> about the decommission problem, here is the link:  http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-decommission-two-slow-nodes-td5078455.html
>  I am also trying to deploy cassandra across two datacenters(with 20ms latency). so I am worrying about the network latency will even make it worse.  
> 
> maybe I was misunderstanding the replication factor, doesn't it RF=3 means I could lose two nodes and still have one available(with 100% of the keys), once Nodes>=3?   besides I am not sure what's twitters setting on RF, but it is possible to lose 3 nodes in the same time(facebook once encountered photo loss because there RAID broken, rarely happen though). I have the strong willing to set RF to a very high value...
> 
> Thanks!
> 
> 
> On Sat, Jul 9, 2011 at 5:22 AM, aaron morton <aa...@thelastpickle.com> wrote:
> AFAIK Facebook Cassandra and Apache Cassandra diverged paths a long time ago. Twitter is a vocal supporter with a large Apache Cassandra install, e.g. "Twitter currently runs a couple hundred Cassandra nodes across a half dozen clusters. " http://www.datastax.com/2011/06/chris-goffinet-of-twitter-to-speak-at-cassandra-sf-2011
> 
> 
> If you are working with a 3 node cluster removing/rebuilding/what ever one node will effect 33% of your capacity. When you scale up the contribution from each individual node goes down, and the impact of one node going down is less. Problems that happen with a few nodes will go away at scale, to be replaced by a whole set of new ones.   
> 
> 
>> 1):  the load balance need to manually performed on every node, according to: 
> 
> Yes
> 	
>> 2): when adding new nodes, need to perform node repair and cleanup on every node 
> 
> 
> 
> 
> 
> 
> You only need to run cleanup, see http://wiki.apache.org/cassandra/Operations#Bootstrap
> 
> 
> 
> 
> 
> 
> 
>> 3) when decommission a node, there is a chance that slow down the entire cluster. (not sure why but I saw people ask around about it.) and the only way to do is shutdown the entire the cluster, rsync the data, and start all nodes without the decommission one. 
> 
> I cannot remember any specific cases where decommission requires a full cluster stop, do you have a link? With regard to slowing down, the decommission process will stream data from the node you are removing onto the other nodes this can slow down the target node (I think it's more intelligent now about what is moved). This will be exaggerated in a 3 node cluster as you are removing 33% of the processing and adding some (temporary) extra load to the remaining nodes. 
> 
> 
> 
> 
> 
> 
> 
>> after all, I think there is alot of human work to do to maintain the cluster which make it impossible to scale to thousands of nodes, 
> 
> Automation, Automation, Automation is the only way to go. 
> 
> Chef, Puppet, CF Engine for general config and deployment; Cloud Kick, munin, ganglia etc for monitoring. And 
> 
> 
> 
> 
> 
> 
> Ops Centre (http://www.datastax.com/products/opscenter) for cassandra specific management.
> 
> 
> 
> 
> 
> 
> 
>> I am totally wrong about all of this, currently I am serving 1 millions pv every day with Cassandra and it make me feel unsafe, I am afraid one day one node crash will cause the data broken and all cluster goes wrong....
> 
> With RF3 and a 3Node cluster you have room to lose one node and the cluster will be up for 100% of the keys. While better than having to worry about *the* database server, it's still entry level fault tolerance. With RF 3 in a 6 Node cluster you can lose up to 2 nodes and still be up for 100% of the keys. 
> 
> 
> 
> 
> 
> 
> 
> Is there something you are specifically concerned about with your current installation ? 
> 
> Cheers
> 
> 
> 
> 
> 
> 
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 8 Jul 2011, at 08:50, Yan Chunlu wrote:
> 
>> hi, all:
>> I am curious about how large that Cassandra can scale? 
>> 
>> from the information I can get, the largest usage is at facebook, which is about 150 nodes.  in the mean time they are using 2000+ nodes with Hadoop, and yahoo even using 4000 nodes of Hadoop. 
>> 
>> I am not understand why is the situation, I only have  little knowledge with Cassandra and even no knowledge with Hadoop. 
>> 
>> 
>> 
>> currently I am using cassandra with 3 nodes and having problem bring one back after it out of sync, the problems I encountered making me worry about how cassandra could scale out: 
>> 
>> 1):  the load balance need to manually performed on every node, according to: 
>> 
>> def tokens(nodes): 
>> 
>> for x in xrange(nodes): 
>> 
>> print 2 ** 127 / nodes * x 
>> 
>> 
>> 
>> 2): when adding new nodes, need to perform node repair and cleanup on every node 
>> 
>> 
>> 
>> 3) when decommission a node, there is a chance that slow down the entire cluster. (not sure why but I saw people ask around about it.) and the only way to do is shutdown the entire the cluster, rsync the data, and start all nodes without the decommission one. 
>> 
>> 
>> 
>> 
>> 
>> after all, I think there is alot of human work to do to maintain the cluster which make it impossible to scale to thousands of nodes, but I hope I am totally wrong about all of this, currently I am serving 1 millions pv every day with Cassandra and it make me feel unsafe, I am afraid one day one node crash will cause the data broken and all cluster goes wrong.... 
>> 
>> 
>> 
>> in the contrary, relational database make me feel safety but it does not scale well. 
>> 
>> 
>> 
>> thanks for any guidance here.
>> 
> 
> 
> 
> 
> -- 
> Charles
>

Re: how large cassandra could scale when it need to do manual operation?

Posted by Chris Goffinet <cg...@chrisgoffinet.com>.

As mentioned by Aaron, yes we run hundreds of Cassandra nodes across
multiple clusters. We run with RF of 2 and 3 (most common).

We use commodity hardware and see failure all the time at this scale. We've
never had 3 nodes that were in same replica set, fail all at once. We
mitigate risk by being rack diverse, using different vendors for our hard
drives, designed workflows to make sure machines get serviced in certain
time windows and have an extensive automated burn-in process of (disk,
memory, drives) to not roll out nodes/clusters that could fail right away.

On Sat, Jul 9, 2011 at 12:17 AM, Yan Chunlu <sp...@gmail.com> wrote:

> thank you very much for the reply. which brings me more confidence on
> cassandra.
> I will try the automation tools, the examples you've listed seems quite
> promising!
>
>
> about the decommission problem, here is the link:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-decommission-two-slow-nodes-td5078455.html
>  I am also trying to deploy cassandra across two datacenters(with 20ms
> latency). so I am worrying about the network latency will even make it
> worse.
>
> maybe I was misunderstanding the replication factor, doesn't it RF=3 means
> I could lose two nodes and still have one available(with 100% of the keys),
> once Nodes>=3?   besides I am not sure what's twitters setting on RF, but it
> is possible to lose 3 nodes in the same time(facebook once encountered photo
> loss because there RAID broken, rarely happen though). I have the strong
> willing to set RF to a very high value...
>
> Thanks!
>
>
> On Sat, Jul 9, 2011 at 5:22 AM, aaron morton <aa...@thelastpickle.com>wrote:
>
>> AFAIK Facebook Cassandra and Apache Cassandra diverged paths a long time
>> ago. Twitter is a vocal supporter with a large Apache Cassandra install,
>> e.g. "Twitter currently runs a couple hundred Cassandra nodes across a half
>> dozen clusters. "
>> http://www.datastax.com/2011/06/chris-goffinet-of-twitter-to-speak-at-cassandra-sf-2011
>>
>>
>>
>> <http://www.datastax.com/2011/06/chris-goffinet-of-twitter-to-speak-at-cassandra-sf-2011>If
>> you are working with a 3 node cluster removing/rebuilding/what ever one node
>> will effect 33% of your capacity. When you scale up the contribution from
>> each individual node goes down, and the impact of one node going down is
>> less. Problems that happen with a few nodes will go away at scale, to be
>> replaced by a whole set of new ones.
>>
>>
>> 1):  the load balance need to manually performed on every node, according
>> to:
>>
>> Yes
>>
>> 2): when adding new nodes, need to perform node repair and cleanup on
>> every node
>>
>> You only need to run cleanup, see
>> http://wiki.apache.org/cassandra/Operations#Bootstrap
>>
>> 3) when decommission a node, there is a chance that slow down the entire
>> cluster. (not sure why but I saw people ask around about it.) and the only
>> way to do is shutdown the entire the cluster, rsync the data, and start all
>> nodes without the decommission one.
>>
>> I cannot remember any specific cases where decommission requires a full
>> cluster stop, do you have a link? With regard to slowing down, the
>> decommission process will stream data from the node you are removing onto
>> the other nodes this can slow down the target node (I think it's more
>> intelligent now about what is moved). This will be exaggerated in a 3 node
>> cluster as you are removing 33% of the processing and adding some
>> (temporary) extra load to the remaining nodes.
>>
>> after all, I think there is alot of human work to do to maintain the
>> cluster which make it impossible to scale to thousands of nodes,
>>
>> Automation, Automation, Automation is the only way to go.
>>
>> Chef, Puppet, CF Engine for general config and deployment; Cloud Kick,
>> munin, ganglia etc for monitoring. And
>> Ops Centre (http://www.datastax.com/products/opscenter) for cassandra
>> specific management.
>>
>> I am totally wrong about all of this, currently I am serving 1 millions pv
>> every day with Cassandra and it make me feel unsafe, I am afraid one day one
>> node crash will cause the data broken and all cluster goes wrong....
>>
>> With RF3 and a 3Node cluster you have room to lose one node and the
>> cluster will be up for 100% of the keys. While better than having to worry
>> about *the* database server, it's still entry level fault tolerance. With RF
>> 3 in a 6 Node cluster you can lose up to 2 nodes and still be up for 100% of
>> the keys.
>>
>> Is there something you are specifically concerned about with your current
>> installation ?
>>
>> Cheers
>>
>>   -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 8 Jul 2011, at 08:50, Yan Chunlu wrote:
>>
>> hi, all:
>> I am curious about how large that Cassandra can scale?
>>
>> from the information I can get, the largest usage is at facebook, which is
>> about 150 nodes.  in the mean time they are using 2000+ nodes with Hadoop,
>> and yahoo even using 4000 nodes of Hadoop.
>>
>> I am not understand why is the situation, I only have  little knowledge
>> with Cassandra and even no knowledge with Hadoop.
>>
>>
>>
>> currently I am using cassandra with 3 nodes and having problem bring one
>> back after it out of sync, the problems I encountered making me worry about
>> how cassandra could scale out:
>>
>> 1):  the load balance need to manually performed on every node, according
>> to:
>>
>> def tokens(nodes):
>>
>> for x in xrange(nodes):
>>
>> print 2 ** 127 / nodes * x
>>
>>
>>
>> 2): when adding new nodes, need to perform node repair and cleanup on
>> every node
>>
>>
>>
>> 3) when decommission a node, there is a chance that slow down the entire
>> cluster. (not sure why but I saw people ask around about it.) and the only
>> way to do is shutdown the entire the cluster, rsync the data, and start all
>> nodes without the decommission one.
>>
>>
>>
>>
>>
>> after all, I think there is alot of human work to do to maintain the
>> cluster which make it impossible to scale to thousands of nodes, but I hope
>> I am totally wrong about all of this, currently I am serving 1 millions pv
>> every day with Cassandra and it make me feel unsafe, I am afraid one day one
>> node crash will cause the data broken and all cluster goes wrong....
>>
>>
>>
>> in the contrary, relational database make me feel safety but it does not
>> scale well.
>>
>>
>>
>> thanks for any guidance here.
>>
>>
>>
>
>
> --
> Charles
>

Re: how large cassandra could scale when it need to do manual operation?

Posted by Yan Chunlu <sp...@gmail.com>.

thank you very much for the reply. which brings me more confidence on
cassandra.
I will try the automation tools, the examples you've listed seems quite
promising!


about the decommission problem, here is the link:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-decommission-two-slow-nodes-td5078455.html
 I am also trying to deploy cassandra across two datacenters(with 20ms
latency). so I am worrying about the network latency will even make it
worse.

maybe I was misunderstanding the replication factor, doesn't it RF=3 means I
could lose two nodes and still have one available(with 100% of the keys),
once Nodes>=3?   besides I am not sure what's twitters setting on RF, but it
is possible to lose 3 nodes in the same time(facebook once encountered photo
loss because there RAID broken, rarely happen though). I have the strong
willing to set RF to a very high value...

Thanks!


On Sat, Jul 9, 2011 at 5:22 AM, aaron morton <aa...@thelastpickle.com>wrote:

> AFAIK Facebook Cassandra and Apache Cassandra diverged paths a long time
> ago. Twitter is a vocal supporter with a large Apache Cassandra install,
> e.g. "Twitter currently runs a couple hundred Cassandra nodes across a half
> dozen clusters. "
> http://www.datastax.com/2011/06/chris-goffinet-of-twitter-to-speak-at-cassandra-sf-2011
>
>
>
> <http://www.datastax.com/2011/06/chris-goffinet-of-twitter-to-speak-at-cassandra-sf-2011>If
> you are working with a 3 node cluster removing/rebuilding/what ever one node
> will effect 33% of your capacity. When you scale up the contribution from
> each individual node goes down, and the impact of one node going down is
> less. Problems that happen with a few nodes will go away at scale, to be
> replaced by a whole set of new ones.
>
>
> 1):  the load balance need to manually performed on every node, according
> to:
>
> Yes
>
> 2): when adding new nodes, need to perform node repair and cleanup on every
> node
>
> You only need to run cleanup, see
> http://wiki.apache.org/cassandra/Operations#Bootstrap
>
> 3) when decommission a node, there is a chance that slow down the entire
> cluster. (not sure why but I saw people ask around about it.) and the only
> way to do is shutdown the entire the cluster, rsync the data, and start all
> nodes without the decommission one.
>
> I cannot remember any specific cases where decommission requires a full
> cluster stop, do you have a link? With regard to slowing down, the
> decommission process will stream data from the node you are removing onto
> the other nodes this can slow down the target node (I think it's more
> intelligent now about what is moved). This will be exaggerated in a 3 node
> cluster as you are removing 33% of the processing and adding some
> (temporary) extra load to the remaining nodes.
>
> after all, I think there is alot of human work to do to maintain the
> cluster which make it impossible to scale to thousands of nodes,
>
> Automation, Automation, Automation is the only way to go.
>
> Chef, Puppet, CF Engine for general config and deployment; Cloud Kick,
> munin, ganglia etc for monitoring. And
> Ops Centre (http://www.datastax.com/products/opscenter) for cassandra
> specific management.
>
> I am totally wrong about all of this, currently I am serving 1 millions pv
> every day with Cassandra and it make me feel unsafe, I am afraid one day one
> node crash will cause the data broken and all cluster goes wrong....
>
> With RF3 and a 3Node cluster you have room to lose one node and the cluster
> will be up for 100% of the keys. While better than having to worry about
> *the* database server, it's still entry level fault tolerance. With RF 3 in
> a 6 Node cluster you can lose up to 2 nodes and still be up for 100% of the
> keys.
>
> Is there something you are specifically concerned about with your current
> installation ?
>
> Cheers
>
>   -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 8 Jul 2011, at 08:50, Yan Chunlu wrote:
>
> hi, all:
> I am curious about how large that Cassandra can scale?
>
> from the information I can get, the largest usage is at facebook, which is
> about 150 nodes.  in the mean time they are using 2000+ nodes with Hadoop,
> and yahoo even using 4000 nodes of Hadoop.
>
> I am not understand why is the situation, I only have  little knowledge
> with Cassandra and even no knowledge with Hadoop.
>
>
>
> currently I am using cassandra with 3 nodes and having problem bring one
> back after it out of sync, the problems I encountered making me worry about
> how cassandra could scale out:
>
> 1):  the load balance need to manually performed on every node, according
> to:
>
> def tokens(nodes):
>
> for x in xrange(nodes):
>
> print 2 ** 127 / nodes * x
>
>
>
> 2): when adding new nodes, need to perform node repair and cleanup on every
> node
>
>
>
> 3) when decommission a node, there is a chance that slow down the entire
> cluster. (not sure why but I saw people ask around about it.) and the only
> way to do is shutdown the entire the cluster, rsync the data, and start all
> nodes without the decommission one.
>
>
>
>
>
> after all, I think there is alot of human work to do to maintain the
> cluster which make it impossible to scale to thousands of nodes, but I hope
> I am totally wrong about all of this, currently I am serving 1 millions pv
> every day with Cassandra and it make me feel unsafe, I am afraid one day one
> node crash will cause the data broken and all cluster goes wrong....
>
>
>
> in the contrary, relational database make me feel safety but it does not
> scale well.
>
>
>
> thanks for any guidance here.
>
>
>


-- 
Charles

Re: how large cassandra could scale when it need to do manual operation?

Posted by aaron morton <aa...@thelastpickle.com>.

AFAIK Facebook Cassandra and Apache Cassandra diverged paths a long time ago. Twitter is a vocal supporter with a large Apache Cassandra install, e.g. "Twitter currently runs a couple hundred Cassandra nodes across a half dozen clusters. " http://www.datastax.com/2011/06/chris-goffinet-of-twitter-to-speak-at-cassandra-sf-2011

If you are working with a 3 node cluster removing/rebuilding/what ever one node will effect 33% of your capacity. When you scale up the contribution from each individual node goes down, and the impact of one node going down is less. Problems that happen with a few nodes will go away at scale, to be replaced by a whole set of new ones.   

> 1):  the load balance need to manually performed on every node, according to: 

Yes

> 2): when adding new nodes, need to perform node repair and cleanup on every node 
You only need to run cleanup, see http://wiki.apache.org/cassandra/Operations#Bootstrap

> 3) when decommission a node, there is a chance that slow down the entire cluster. (not sure why but I saw people ask around about it.) and the only way to do is shutdown the entire the cluster, rsync the data, and start all nodes without the decommission one. 

I cannot remember any specific cases where decommission requires a full cluster stop, do you have a link? With regard to slowing down, the decommission process will stream data from the node you are removing onto the other nodes this can slow down the target node (I think it's more intelligent now about what is moved). This will be exaggerated in a 3 node cluster as you are removing 33% of the processing and adding some (temporary) extra load to the remaining nodes. 

> after all, I think there is alot of human work to do to maintain the cluster which make it impossible to scale to thousands of nodes, 
Automation, Automation, Automation is the only way to go. 

Chef, Puppet, CF Engine for general config and deployment; Cloud Kick, munin, ganglia etc for monitoring. And 
Ops Centre (http://www.datastax.com/products/opscenter) for cassandra specific management.

> I am totally wrong about all of this, currently I am serving 1 millions pv every day with Cassandra and it make me feel unsafe, I am afraid one day one node crash will cause the data broken and all cluster goes wrong....
With RF3 and a 3Node cluster you have room to lose one node and the cluster will be up for 100% of the keys. While better than having to worry about *the* database server, it's still entry level fault tolerance. With RF 3 in a 6 Node cluster you can lose up to 2 nodes and still be up for 100% of the keys. 

Is there something you are specifically concerned about with your current installation ? 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 8 Jul 2011, at 08:50, Yan Chunlu wrote:

> hi, all:
> I am curious about how large that Cassandra can scale? 
> 
> from the information I can get, the largest usage is at facebook, which is about 150 nodes.  in the mean time they are using 2000+ nodes with Hadoop, and yahoo even using 4000 nodes of Hadoop. 
> 
> I am not understand why is the situation, I only have  little knowledge with Cassandra and even no knowledge with Hadoop. 
> 
> 
> 
> currently I am using cassandra with 3 nodes and having problem bring one back after it out of sync, the problems I encountered making me worry about how cassandra could scale out: 
> 
> 1):  the load balance need to manually performed on every node, according to: 
> 
> def tokens(nodes): 
> 
> for x in xrange(nodes): 
> 
> print 2 ** 127 / nodes * x 
> 
> 
> 
> 2): when adding new nodes, need to perform node repair and cleanup on every node 
> 
> 
> 
> 3) when decommission a node, there is a chance that slow down the entire cluster. (not sure why but I saw people ask around about it.) and the only way to do is shutdown the entire the cluster, rsync the data, and start all nodes without the decommission one. 
> 
> 
> 
> 
> 
> after all, I think there is alot of human work to do to maintain the cluster which make it impossible to scale to thousands of nodes, but I hope I am totally wrong about all of this, currently I am serving 1 millions pv every day with Cassandra and it make me feel unsafe, I am afraid one day one node crash will cause the data broken and all cluster goes wrong.... 
> 
> 
> 
> in the contrary, relational database make me feel safety but it does not scale well. 
> 
> 
> 
> thanks for any guidance here.
>