You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Luis Miguel <ar...@hotmail.com> on 2015/11/02 20:21:51 UTC

FW: Two node cassandra cluster doubts

Hello!
I have set a cassandra cluster with two nodes, Node A  and Node B --> RF=2, Read CL=1 and Write CL = 1;
Node A is seed...

At first everything is working well, when I add/delete/update entries on Node A, everything is replicated on Node B and vice-versa, even if I shut down node A, and I made new insertions on Node B meanwhile, and After that I start up node A again Cassandra recovers OK....BUT there is ONE case when this situation fails.... I am going to describe the process:
Node A and Node B are sync.
Select Count (*) From MYTABLE;---> 10 rows
Shut down Node A.
Made some inserts on Node B.
Select Count (*) From MYTABLE;---> 15 rows
Shut down Node B.
Start Up Node B.
Select Count (*) From MYTABLE;---> 15 rows
(Everything Ok, yet).
Start Up Node A.
Select Count (*) From MYTABLE;---> 10 rows (uhmmm...this is weird...check it again)Select Count (*) From MYTABLE;---> 15 rows  (wow!..this is correct, lets try again)Select Count (*) From MYTABLE;---> 10 rows (Ok...values are dancing)
If I made the same queries on NODE B it Behaves the same way.... and it only is solved with a nodetool repair...but I would prefer an automatic fail-over...
is there any way to avoid this??? or a nodetool repair execution is mandatory???
Thanks in advance!!!

Re: Cassandra Cluster Doubts

Posted by Vladimir Yudovin <vl...@winguzone.com>.

Hi Luis,



I don't thinks it's possible to achieve this by custom Snitch. As far as I understand Snitch only provides cluster topology, and connectivity is done by another component/layer. And each cluster node should be able to connect to every other node. So I would keep with Michael's options a) - "establish network communication forthe entire cluster" 



Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






---- On Fri, 21 Apr 2017 15:42:17 -0400 Luis Miguel &lt;arbox_@hotmail.com&gt; wrote ----




Hi Michael! 

 

Thanks for your answer I feared that was the answer...do you know if implementing my own Snitch would be possible to handle this situation? 

________________________________ 

De: Michael Shuler &lt;mshuler@pbandjelly.org&gt; en nombre de Michael Shuler &lt;michael@pbandjelly.org&gt; 

Enviado: viernes, 21 de abril de 2017 19:16:43 

Para: user@cassandra.apache.org 

Asunto: Re: Cassandra Cluster Doubts 

 

You have one cluster that is comprised of N nodes that may be 

distributed in racks and data centers. All the nodes of your cluster 

need to be able to communicate - they are one cluster. 

 

I think your options would be to a) establish network communication for 

the entire cluster, or b) set up a new cluster for DCR2 and sync data 

snapshots of Keyspace2 in some manner, or c) figure out a second cluster 

that contains the data centers that do have network connectivity and 

adjust application to query the appropriate cluster. There may be some 

other creative ideas that pop up. 

 

-- 

Kind regards, 

Michael 

 

On 04/21/2017 07:26 AM, Luis Miguel wrote: 

&gt; Hello! 

&gt; 

&gt; 

&gt; I have three DC: 

&gt; 

&gt; DC1 -&gt; 3 nodes, Keyspace1:3 

&gt; DC2 -&gt; 3 nodes, Keyspace2:3 

&gt; DCR1 -&gt; 3 nodes, Keyspace1:2, Keyspace2:2 

&gt; 

&gt; now I am trying to add a new datacenter to the cluster: 

&gt; 

&gt; DCR2-&gt; 1 node (by now), Keyspace2:1 which network configuration can 

&gt; access to DC2 and DCR1 but it will never has access to DC1. 

&gt; 

&gt; when I try to start the node in DCR2, it does everything right with 

&gt; Keyspace2...but Gossips DCR1 and DC1... and crashes with 

&gt; RuntimeException because it can't move data consistently from DC1 nodes 

&gt; (obviously I don't have network connection to those nodes from this 

&gt; datacenter)... 

&gt; when I try to use -Dcassandra.consistent.rangemovement= false 

&gt; option ...It also crashes with IllegalStateException: unable to find 

&gt; sufficient sources for streaming range......etc..etc.. 

&gt; 

&gt; It is possible to have that kind of topology in cassandra? I mean.. Can 

&gt; I have a cluster where some datacenters will never "connect" other 

&gt; datacenters? 

&gt; 

&gt; Thanks in advance!!!

Re: Cassandra Cluster Doubts

Posted by Luis Miguel <ar...@hotmail.com>.

Hi Michael!

Thanks for your answer I feared that was the answer...do you know if implementing my own Snitch would be possible to handle this situation?
________________________________
De: Michael Shuler <ms...@pbandjelly.org> en nombre de Michael Shuler <mi...@pbandjelly.org>
Enviado: viernes, 21 de abril de 2017 19:16:43
Para: user@cassandra.apache.org
Asunto: Re: Cassandra Cluster Doubts

You have one cluster that is comprised of N nodes that may be
distributed in racks and data centers. All the nodes of your cluster
need to be able to communicate - they are one cluster.

I think your options would be to a) establish network communication for
the entire cluster, or b) set up a new cluster for DCR2 and sync data
snapshots of Keyspace2 in some manner, or c) figure out a second cluster
that contains the data centers that do have network connectivity and
adjust application to query the appropriate cluster. There may be some
other creative ideas that pop up.

--
Kind regards,
Michael

On 04/21/2017 07:26 AM, Luis Miguel wrote:
> Hello!
>
>
> I have three DC:
>
> DC1 -> 3 nodes, Keyspace1:3
> DC2 -> 3 nodes, Keyspace2:3
> DCR1 -> 3 nodes,  Keyspace1:2, Keyspace2:2
>
> now I am trying to add a new datacenter to the cluster:
>
> DCR2-> 1 node (by now), Keyspace2:1 which network configuration can
> access to  DC2 and DCR1 but it will never has access to DC1.
>
> when I try to start the node in DCR2, it does everything right with
> Keyspace2...but Gossips DCR1 and DC1... and crashes with
> RuntimeException because it can't move data consistently from DC1 nodes
> (obviously I don't have network connection to those nodes from this
> datacenter)...
> when I try to use -Dcassandra.consistent.rangemovement= false
> option ...It also crashes with IllegalStateException: unable to find
> sufficient sources for streaming range......etc..etc..
>
> It is possible to have that kind of topology in cassandra? I mean.. Can
> I have a cluster where some datacenters will never "connect" other
> datacenters?
>
> Thanks in advance!!!

Re: Cassandra Cluster Doubts

Posted by Michael Shuler <mi...@pbandjelly.org>.

You have one cluster that is comprised of N nodes that may be
distributed in racks and data centers. All the nodes of your cluster
need to be able to communicate - they are one cluster.

I think your options would be to a) establish network communication for
the entire cluster, or b) set up a new cluster for DCR2 and sync data
snapshots of Keyspace2 in some manner, or c) figure out a second cluster
that contains the data centers that do have network connectivity and
adjust application to query the appropriate cluster. There may be some
other creative ideas that pop up.

-- 
Kind regards,
Michael

On 04/21/2017 07:26 AM, Luis Miguel wrote:
> Hello!
> 
> 
> I have three DC:
> 
> DC1 -> 3 nodes, Keyspace1:3
> DC2 -> 3 nodes, Keyspace2:3
> DCR1 -> 3 nodes,  Keyspace1:2, Keyspace2:2
> 
> now I am trying to add a new datacenter to the cluster:
> 
> DCR2-> 1 node (by now), Keyspace2:1 which network configuration can
> access to  DC2 and DCR1 but it will never has access to DC1.
> 
> when I try to start the node in DCR2, it does everything right with
> Keyspace2...but Gossips DCR1 and DC1... and crashes with
> RuntimeException because it can't move data consistently from DC1 nodes
> (obviously I don't have network connection to those nodes from this
> datacenter)...
> when I try to use -Dcassandra.consistent.rangemovement= false
> option ...It also crashes with IllegalStateException: unable to find
> sufficient sources for streaming range......etc..etc..
> 
> It is possible to have that kind of topology in cassandra? I mean.. Can
> I have a cluster where some datacenters will never "connect" other
> datacenters?
> 
> Thanks in advance!!!

Cassandra Cluster Doubts

Posted by Luis Miguel <ar...@hotmail.com>.

Hello!

I have three DC:

DC1 -> 3 nodes, Keyspace1:3
DC2 -> 3 nodes, Keyspace2:3
DCR1 -> 3 nodes,  Keyspace1:2, Keyspace2:2

now I am trying to add a new datacenter to the cluster:

DCR2-> 1 node (by now), Keyspace2:1 which network configuration can access to  DC2 and DCR1 but it will never has access to DC1.

when I try to start the node in DCR2, it does everything right with Keyspace2...but Gossips DCR1 and DC1... and crashes with RuntimeException because it can't move data consistently from DC1 nodes (obviously I don't have network connection to those nodes from this datacenter)...
when I try to use -Dcassandra.consistent.rangemovement= false option ...It also crashes with IllegalStateException: unable to find sufficient sources for streaming range......etc..etc..

It is possible to have that kind of topology in cassandra? I mean.. Can I have a cluster where some datacenters will never "connect" other datacenters?

Thanks in advance!!!

Re: Two node cassandra cluster doubts

Posted by Bryan Cheng <br...@blockcypher.com>.

I believe what's going on here is this step:

Select Count (*) From MYTABLE;---> 15 rows

Shut down Node B.

Start Up Node B.

Select Count (*) From MYTABLE;---> 15 rows

To understand why this is an issue, consider the way that consistency is
attempted within Cassandra. With RF=2, (You should really use an odd number
RF and LOCAL_QUORUM so you can tolerate a node failure, but that's another
thing), your write is hitting Node B, and being queued for writing to Node
A via a process called hinted handoff. Normally, this handoff occurs when
Node A returns to the cluster, up to max_hint_window_in_ms later, causing
all writes it missed to be replayed and integrated. However, since Node B
also goes down during this time period, it loses the queued hints and
therefore Node A never gets that write.

You may see this flip flopping due to your query hitting Node A and Node B
alternately (you can use trace to verify this).

Keep in mind that due to Cassandra's architecture, missing writes will
result in inconsistent data. There are mechanisms to help mitigate this,
for example the aforementioned hinted handoff, or read repair. However, at
the end of the day the only way to ensure consistent data is a repair.
These mechanisms cannot operate reliably if the entire cluster goes down-
which happens in your scenario between the above steps.

On Mon, Nov 2, 2015 at 12:46 PM, Luis Miguel <ar...@hotmail.com> wrote:

> Thanks for your answer!
> I thought that bootstrapping is executed only when you add a node to the
> cluster the first time after that I thought tgat gossip is the method used
> to discover the cluster members again....In my case I thought that it was
> more about a read repair issue.., am I wrong?
>
> ------------------------------
> Date: Mon, 2 Nov 2015 21:12:20 +0100
> Subject: Re: FW: Two node cassandra cluster doubts
> From: ichi.sara@gmail.com
> To: user@cassandra.apache.org
>
>
> I think that this is a normal behaviour as you shut down your seed and
> then reboot it. You should know that when you start a seed node it doesn't
> do the bootstrapping thing. Which means it doesn't look if there are
> changes in the contents of the tables. In here in your tests, you shut down
> node A before doing the inserts and started it after. So you node A doesn't
> have the new rows you inserted. And yes it is normal to have  different
> values of your query each time. Because the coordinator node changes and
> therfore  the query is executed each time on a different node ( when  node
> B answers you've got 15 rows and WHE  node A does you have 10 rows)
> Le 2 nov. 2015 19:22, "Luis Miguel" <ar...@hotmail.com> a écrit :
>
> Hello!
>
> I have set a cassandra cluster with two nodes, Node A  and Node B --> RF=2,
> Read CL=1 and Write CL = 1;
>
> Node A is seed...
>
>
> At first everything is working well, when I add/delete/update entries on
> Node A, everything is replicated on Node B and vice-versa, even if I shut
> down node A, and I made new insertions on Node B meanwhile, and After that
> I start up node A again Cassandra recovers OK....BUT there is ONE case when
> this situation fails.... I am going to describe the process:
>
> Node A and Node B are sync.
>
> Select Count (*) From MYTABLE;---> 10 rows
>
> Shut down Node A.
>
> Made some inserts on Node B.
>
> Select Count (*) From MYTABLE;---> 15 rows
>
> Shut down Node B.
>
> Start Up Node B.
>
> Select Count (*) From MYTABLE;---> 15 rows
>
> (Everything Ok, yet).
>
> Start Up Node A.
>
> Select Count (*) From MYTABLE;---> 10 rows (uhmmm...this is weird...check
> it again)
> Select Count (*) From MYTABLE;---> 15 rows  (wow!..this is correct, lets
> try again)
> Select Count (*) From MYTABLE;---> 10 rows (Ok...values are dancing)
>
> If I made the same queries on NODE B it Behaves the same way.... and it
> only is solved with a nodetool repair...but I would prefer an automatic
> fail-over...
>
> is there any way to avoid this??? or a nodetool repair execution is
> mandatory???
>
> Thanks in advance!!!
>
>

RE: Two node cassandra cluster doubts

Posted by Luis Miguel <ar...@hotmail.com>.

Thanks for your answer! 
I thought that bootstrapping is executed only when you add a node to the cluster the first time after that I thought tgat gossip is the method used to discover the cluster members again....In my case I thought that it was more about a read repair issue.., am I wrong? 

Date: Mon, 2 Nov 2015 21:12:20 +0100
Subject: Re: FW: Two node cassandra cluster doubts
From: ichi.sara@gmail.com
To: user@cassandra.apache.org

I think that this is a normal behaviour as you shut down your seed and then reboot it. You should know that when you start a seed node it doesn't do the bootstrapping thing. Which means it doesn't look if there are changes in the contents of the tables. In here in your tests, you shut down node A before doing the inserts and started it after. So you node A doesn't have the new rows you inserted. And yes it is normal to have  different values of your query each time. Because the coordinator node changes and therfore  the query is executed each time on a different node ( when  node B answers you've got 15 rows and WHE  node A does you have 10 rows)
Le 2 nov. 2015 19:22, "Luis Miguel" <ar...@hotmail.com> a écrit :

Hello!
I have set a cassandra cluster with two nodes, Node A  and Node B --> RF=2, Read CL=1 and Write CL = 1;
Node A is seed...

At first everything is working well, when I add/delete/update entries on Node A, everything is replicated on Node B and vice-versa, even if I shut down node A, and I made new insertions on Node B meanwhile, and After that I start up node A again Cassandra recovers OK....BUT there is ONE case when this situation fails.... I am going to describe the process:
Node A and Node B are sync.
Select Count (*) From MYTABLE;---> 10 rows
Shut down Node A.
Made some inserts on Node B.
Select Count (*) From MYTABLE;---> 15 rows
Shut down Node B.
Start Up Node B.
Select Count (*) From MYTABLE;---> 15 rows
(Everything Ok, yet).
Start Up Node A.
Select Count (*) From MYTABLE;---> 10 rows (uhmmm...this is weird...check it again)Select Count (*) From MYTABLE;---> 15 rows  (wow!..this is correct, lets try again)Select Count (*) From MYTABLE;---> 10 rows (Ok...values are dancing)
If I made the same queries on NODE B it Behaves the same way.... and it only is solved with a nodetool repair...but I would prefer an automatic fail-over...
is there any way to avoid this??? or a nodetool repair execution is mandatory???
Thanks in advance!!!

Re: FW: Two node cassandra cluster doubts

Posted by ICHIBA Sara <ic...@gmail.com>.

I think that this is a normal behaviour as you shut down your seed and then
reboot it. You should know that when you start a seed node it doesn't do
the bootstrapping thing. Which means it doesn't look if there are changes
in the contents of the tables. In here in your tests, you shut down node A
before doing the inserts and started it after. So you node A doesn't have
the new rows you inserted. And yes it is normal to have  different values
of your query each time. Because the coordinator node changes and therfore
the query is executed each time on a different node ( when  node B answers
you've got 15 rows and WHE  node A does you have 10 rows)
Le 2 nov. 2015 19:22, "Luis Miguel" <ar...@hotmail.com> a écrit :

> Hello!
>
> I have set a cassandra cluster with two nodes, Node A  and Node B --> RF=2,
> Read CL=1 and Write CL = 1;
>
> Node A is seed...
>
>
> At first everything is working well, when I add/delete/update entries on
> Node A, everything is replicated on Node B and vice-versa, even if I shut
> down node A, and I made new insertions on Node B meanwhile, and After that
> I start up node A again Cassandra recovers OK....BUT there is ONE case when
> this situation fails.... I am going to describe the process:
>
> Node A and Node B are sync.
>
> Select Count (*) From MYTABLE;---> 10 rows
>
> Shut down Node A.
>
> Made some inserts on Node B.
>
> Select Count (*) From MYTABLE;---> 15 rows
>
> Shut down Node B.
>
> Start Up Node B.
>
> Select Count (*) From MYTABLE;---> 15 rows
>
> (Everything Ok, yet).
>
> Start Up Node A.
>
> Select Count (*) From MYTABLE;---> 10 rows (uhmmm...this is weird...check
> it again)
> Select Count (*) From MYTABLE;---> 15 rows  (wow!..this is correct, lets
> try again)
> Select Count (*) From MYTABLE;---> 10 rows (Ok...values are dancing)
>
> If I made the same queries on NODE B it Behaves the same way.... and it
> only is solved with a nodetool repair...but I would prefer an automatic
> fail-over...
>
> is there any way to avoid this??? or a nodetool repair execution is
> mandatory???
>
> Thanks in advance!!!
>