You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by 赵豫峰 <zh...@easemob.com> on 2017/08/30 04:14:06 UTC

why the cluster does not work well after addding two new nodes

Hi, I have a cluster with two node servers（I know it’s in a wrong way  but it‘s builded by another colleague who has left）， and it's keyspace set like：


CREATE KEYSPACE my_keyspace WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '3'}  AND durable_writes = true;


one day my boss said one node was down for a long time and another worked normally, tell my to restart the cluster.


First, I make a snapshot from the working node;
then, I check the data numbers with select count(*) cql statement, the result is more then 170000;
Next, I add two new nodes. After new node worked, I use select count(*)  cql to check the data several times, but now I get uncertain resluts, and each reslut is less then 10000; I check node status with ./nodetool status cql, and every node is UN, but the load of two new nodes is far less then the normal node。
I stop the two new nodes, use “select count(*)” cql and get the right result again.


I build a new cluster in sandbox env with snapshot file， and get the same result like above。 I used "./nodetool repair" sql，then the cluster works well but I don't know why.


I guess it because two nodes with "replication = {'class': 'SimpleStrategy', 'replication_factor': '3'} " can make splite brain and the data won't be consistent，or the data file is broken but not make sure。Why did it happen, why I have to use "./nodetool repair" command, and when to use it?


Thanks!





------------------


赵豫峰



环信即时通讯云/研发

Re: why the cluster does not work well after addding two new nodes

Posted by 赵豫峰 <zh...@easemob.com>.

Hi, Nandan, thanks for your replay.


1)  Yes the config is wrong. My doubt is why I can get the result when one node is alive, but can't when two or three nodes are alive;
2)  I guess your mean the auto_bootstrap configure parameter? It is not set in cassandra.yaml;

3) Dose it must  to rebalance when adding new node？ I remeber that it’s ok when just add a new node without any operation before。
 
Thanks for your advice!






 
 
------------------ Original ------------------
From:  "@Nandan@"<na...@gmail.com>;
Date:  Wed, Aug 30, 2017 01:04 PM
To:  "zhaoyf"<zh...@easemob.com>; "user"<us...@cassandra.apache.org>; 

Subject:  Re: why the cluster does not work well after addding two new nodes

 
Hi , What happened wrong from starting, I am just listing down:-
1) Had 2 nodes servers but created Keyspace with RF 3. [Always make sure RF <= Total No. of Nodes]
2) While Adding New Nodes, Make sure that Auto_bootstraping is Enable or not. 
3) Once You added 2 new nodes, better things will be you have to do node rebalance. 
There are 2 different way by which you can do rebalance. 
A) Use OpsCenter -> And select Rebalance Cluster. 
B) Use onnodetool cleanup that node afterward to clean up data no longer belonging to that node.



Best Regards, 
Nandan




On Wed, Aug 30, 2017 at 12:14 PM, 赵豫峰 <zh...@easemob.com> wrote:
Hi, I have a cluster with two node servers（I know it’s in a wrong way  but it‘s builded by another colleague who has left）， and it's keyspace set like：


CREATE KEYSPACE my_keyspace WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '3'}  AND durable_writes = true;


one day my boss said one node was down for a long time and another worked normally, tell my to restart the cluster.


First, I make a snapshot from the working node;
then, I check the data numbers with select count(*) cql statement, the result is more then 170000;
Next, I add two new nodes. After new node worked, I use select count(*)  cql to check the data several times, but now I get uncertain resluts, and each reslut is less then 10000; I check node status with ./nodetool status cql, and every node is UN, but the load of two new nodes is far less then the normal node。
I stop the two new nodes, use “select count(*)” cql and get the right result again.


I build a new cluster in sandbox env with snapshot file， and get the same result like above。 I used "./nodetool repair" sql，then the cluster works well but I don't know why.


I guess it because two nodes with "replication = {'class': 'SimpleStrategy', 'replication_factor': '3'} " can make splite brain and the data won't be consistent，or the data file is broken but not make sure。Why did it happen, why I have to use "./nodetool repair" command, and when to use it?


Thanks!





------------------


赵豫峰



环信即时通讯云/研发

Re: why the cluster does not work well after addding two new nodes

Posted by "@Nandan@" <na...@gmail.com>.

Hi ,
What happened wrong from starting, I am just listing down:-
1) Had 2 nodes servers but created Keyspace with RF 3. [Always make sure RF
<= Total No. of Nodes]
2) While Adding New Nodes, Make sure that Auto_bootstraping is Enable or
not.
3) Once You added 2 new nodes, better things will be you have to do node
rebalance.
There are 2 different way by which you can do rebalance.
A) Use OpsCenter -> And select Rebalance Cluster.
B) Use onnodetool cleanup that node afterward to clean up data no longer
belonging to that node.

Best Regards,
Nandan


On Wed, Aug 30, 2017 at 12:14 PM, 赵豫峰 <zh...@easemob.com> wrote:

> Hi, I have a cluster with two node servers（I know it’s in a wrong way  but
> it‘s builded by another colleague who has left）， and it's keyspace set like：
>
> CREATE KEYSPACE my_keyspace WITH replication = {'class': 'SimpleStrategy',
> 'replication_factor': '3'}  AND durable_writes = true;
>
> one day my boss said one node was down for a long time and another worked
> normally, tell my to restart the cluster.
>
> First, I make a snapshot from the working node;
> then, I check the data numbers with select count(*) cql statement, the
> result is more then 170000;
> Next, I add two new nodes. After new node worked, I use select
> count(*)  cql to check the data several times, but now I get uncertain
> resluts, and each reslut is less then 10000; I check node status with
> ./nodetool status cql, and every node is UN, but the load of two new nodes
> is far less then the normal node。
> I stop the two new nodes, use “select count(*)” cql and get the right
> result again.
>
> I build a new cluster in sandbox env with snapshot file， and get the same
> result like above。 I used "./nodetool repair" sql，then the cluster works
> well but I don't know why.
>
> I guess it because two nodes with "replication = {'class':
> 'SimpleStrategy', 'replication_factor': '3'} " can make splite brain and
> the data won't be consistent，or the data file is broken but not make
> sure。Why did it happen, why I have to use "./nodetool repair" command, and
> when to use it?
>
> Thanks!
>
>
>
>
> ------------------
> 赵豫峰
>
> 环信即时通讯云/研发
>
>