You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Jabbar <aj...@gmail.com> on 2013/01/08 23:33:31 UTC

remote datacentre consistency

I'm a bit confused about how a two datacentre apache cassandra cluster
keeps the data consistent.

>From what I understand a client application in datacentre1 contacts a
coordinator node which sends the data to the local replicas and it also
sends the updates to the remote coordinator in the remote data centre.

Does the local coordinator send the updates asynchronously to the local
replicas and the remote coordinator node?

What happens if the bandwidth is severely restricted to the remote
datacentre? Do the updates for the remote coordinator keep getting buffered
up in the local coordinator?

What happens if the connection to the remote coordinator is down? Would
hinted hand off be used to recover from this scenario?  What options are
there to synchronise the remote datacentre if the connectivity comes back
after a couple of days?


-- 
Thanks

 A Jabbar Azam

Re: remote datacentre consistency

Posted by Jabbar <aj...@gmail.com>.

Aaron,

Thank you for your answers.
On 10 Jan 2013 00:27, "aaron morton" <aa...@thelastpickle.com> wrote:

> I thought Hinted Handoff was for downed replica's in the local datacentre.
> I didn't realise that it would work with a remote datacenter.
>
> If the coordinator will store a hint if it detects a replica is down
> before the request starts, or that the node did not return within
> rpc_timeout.
>
> Likewise for Anti Entropy I thought it only worked for the replicas in the
> local datacentre. I yet to find any definitive references which mention
> that this works across multiple datacentres.
>
> It works on the cluster as a whole paying attention to the replication
> settings.
>
> So if you have replicas in 2 dc's it will repair across them.
>
> Does the local coordinator send the updates asynchronously to the local
>> replicas and the remote coordinator node?****
>>
> Yes. All iternode communication is async.
>
> What happens if the bandwidth is severely restricted to the remote
>> datacentre? Do the updates for the remote coordinator keep getting buffered
>> up in the local coordinator?****
>>
> There is a local queue of messages to send, if the messages are in the
> queue for more than rpc_timeout they will not be sent.
>
> What happens if the connection to the remote coordinator is down?
>>
> It depends on the CL you are using. If you are using CL QUOURM your writes
> will probably fail, depending on the RF settings. If you are using CL
> LOCAL_QUOURM they will work so long as there is a local quourm. If you are
> using EACH_QUOURM they will fail.
>
> Hope that helps.
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 10/01/2013, at 11:40 AM, Jabbar <aj...@gmail.com> wrote:
>
> Hello Simon,
>
> I thought Hinted Handoff was for downed replica's in the local datacentre.
> I didn't realise that it would work with a remote datacenter.
>
> Likewise for Anti Entropy I thought it only worked for the replicas in the
> local datacentre. I yet to find any definitive references which mention
> that this works across multiple datacentres.
>
> I'll keep looking. Obviously there'll probably be some documents I haven't
> read read yet.
>
>
> On 9 January 2013 18:38, Simon Guindon <si...@jsitelecom.com>wrote:
>
>>  Here’s a good document on how hinted handoff works****
>>
>> http://www.datastax.com/dev/blog/modern-hinted-handoff****
>>
>> ** **
>>
>> I believe if I understand that document correctly that a hinted handoff
>> will get created if the replica is down in the other data center. Also
>> since Cassandra is self-healing, reads will cause read repairs to correct
>> any inconsistent data.****
>>
>> ** **
>>
>> Also Cassandra has an anti-entropy mechanism that actively updates
>> replicas to the newest version using a Merkle tree.****
>>
>> ** **
>>
>> Here’s some text on Anti-entropy****
>>
>> http://wiki.apache.org/cassandra/AntiEntropy****
>>
>> ** **
>>
>> ** **
>>
>> *From:* Jabbar [mailto:ajazam@gmail.com]
>> *Sent:* January-08-13 5:34 PM
>> *To:* user@cassandra.apache.org
>> *Subject:* remote datacentre consistency****
>>
>> ** **
>>
>> I'm a bit confused about how a two datacentre apache cassandra cluster
>> keeps the data consistent.****
>>
>> From what I understand a client application in datacentre1 contacts a
>> coordinator node which sends the data to the local replicas and it also
>> sends the updates to the remote coordinator in the remote data centre.***
>> *
>>
>> ** **
>>
>> Does the local coordinator send the updates asynchronously to the local
>> replicas and the remote coordinator node? ****
>>
>> What happens if the bandwidth is severely restricted to the remote
>> datacentre? Do the updates for the remote coordinator keep getting buffered
>> up in the local coordinator?****
>>
>> What happens if the connection to the remote coordinator is down? Would
>> hinted hand off be used to recover from this scenario?  What options are
>> there to synchronise the remote datacentre if the connectivity comes back
>> after a couple of days?****
>>
>> ** **
>>
>> --
>> Thanks
>>
>>  A Jabbar Azam****
>>
>
>
>
> --
> Thanks
>
>  A Jabbar Azam
>
>
>

Re: remote datacentre consistency

Posted by aaron morton <aa...@thelastpickle.com>.

> I thought Hinted Handoff was for downed replica's in the local datacentre. I didn't realise that it would work with a remote datacenter.
If the coordinator will store a hint if it detects a replica is down before the request starts, or that the node did not return within rpc_timeout.

> Likewise for Anti Entropy I thought it only worked for the replicas in the local datacentre. I yet to find any definitive references which mention that this works across multiple datacentres.
It works on the cluster as a whole paying attention to the replication settings. 

So if you have replicas in 2 dc's it will repair across them. 

> Does the local coordinator send the updates asynchronously to the local replicas and the remote coordinator node?
> 

Yes. All iternode communication is async.

> What happens if the bandwidth is severely restricted to the remote datacentre? Do the updates for the remote coordinator keep getting buffered up in the local coordinator?
> 

There is a local queue of messages to send, if the messages are in the queue for more than rpc_timeout they will not be sent. 

> What happens if the connection to the remote coordinator is down?
> 

It depends on the CL you are using. If you are using CL QUOURM your writes will probably fail, depending on the RF settings. If you are using CL LOCAL_QUOURM they will work so long as there is a local quourm. If you are using EACH_QUOURM they will fail. 

Hope that helps. 

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 10/01/2013, at 11:40 AM, Jabbar <aj...@gmail.com> wrote:

> Hello Simon,
> 
> I thought Hinted Handoff was for downed replica's in the local datacentre. I didn't realise that it would work with a remote datacenter.
> 
> Likewise for Anti Entropy I thought it only worked for the replicas in the local datacentre. I yet to find any definitive references which mention that this works across multiple datacentres.
> 
> I'll keep looking. Obviously there'll probably be some documents I haven't read read yet.
> 
> 
> On 9 January 2013 18:38, Simon Guindon <si...@jsitelecom.com> wrote:
> Here’s a good document on how hinted handoff works
> 
> http://www.datastax.com/dev/blog/modern-hinted-handoff
> 
>  
> 
> I believe if I understand that document correctly that a hinted handoff will get created if the replica is down in the other data center. Also since Cassandra is self-healing, reads will cause read repairs to correct any inconsistent data.
> 
>  
> 
> Also Cassandra has an anti-entropy mechanism that actively updates replicas to the newest version using a Merkle tree.
> 
>  
> 
> Here’s some text on Anti-entropy
> 
> http://wiki.apache.org/cassandra/AntiEntropy
> 
>  
> 
>  
> 
> From: Jabbar [mailto:ajazam@gmail.com] 
> Sent: January-08-13 5:34 PM
> To: user@cassandra.apache.org
> Subject: remote datacentre consistency
> 
>  
> 
> I'm a bit confused about how a two datacentre apache cassandra cluster keeps the data consistent.
> 
> From what I understand a client application in datacentre1 contacts a coordinator node which sends the data to the local replicas and it also sends the updates to the remote coordinator in the remote data centre.
> 
>  
> 
> Does the local coordinator send the updates asynchronously to the local replicas and the remote coordinator node?
> 
> What happens if the bandwidth is severely restricted to the remote datacentre? Do the updates for the remote coordinator keep getting buffered up in the local coordinator?
> 
> What happens if the connection to the remote coordinator is down? Would hinted hand off be used to recover from this scenario?  What options are there to synchronise the remote datacentre if the connectivity comes back after a couple of days?
> 
>  
> 
> -- 
> Thanks
> 
>  A Jabbar Azam
> 
> 
> 
> 
> -- 
> Thanks
> 
>  A Jabbar Azam

Re: remote datacentre consistency

Posted by Jabbar <aj...@gmail.com>.

Hello Simon,

I thought Hinted Handoff was for downed replica's in the local datacentre.
I didn't realise that it would work with a remote datacenter.

Likewise for Anti Entropy I thought it only worked for the replicas in the
local datacentre. I yet to find any definitive references which mention
that this works across multiple datacentres.

I'll keep looking. Obviously there'll probably be some documents I haven't
read read yet.


On 9 January 2013 18:38, Simon Guindon <si...@jsitelecom.com> wrote:

>  Here’s a good document on how hinted handoff works****
>
> http://www.datastax.com/dev/blog/modern-hinted-handoff****
>
> ** **
>
> I believe if I understand that document correctly that a hinted handoff
> will get created if the replica is down in the other data center. Also
> since Cassandra is self-healing, reads will cause read repairs to correct
> any inconsistent data.****
>
> ** **
>
> Also Cassandra has an anti-entropy mechanism that actively updates
> replicas to the newest version using a Merkle tree.****
>
> ** **
>
> Here’s some text on Anti-entropy****
>
> http://wiki.apache.org/cassandra/AntiEntropy****
>
> ** **
>
> ** **
>
> *From:* Jabbar [mailto:ajazam@gmail.com]
> *Sent:* January-08-13 5:34 PM
> *To:* user@cassandra.apache.org
> *Subject:* remote datacentre consistency****
>
> ** **
>
> I'm a bit confused about how a two datacentre apache cassandra cluster
> keeps the data consistent.****
>
> From what I understand a client application in datacentre1 contacts a
> coordinator node which sends the data to the local replicas and it also
> sends the updates to the remote coordinator in the remote data centre.****
>
> ** **
>
> Does the local coordinator send the updates asynchronously to the local
> replicas and the remote coordinator node? ****
>
> What happens if the bandwidth is severely restricted to the remote
> datacentre? Do the updates for the remote coordinator keep getting buffered
> up in the local coordinator?****
>
> What happens if the connection to the remote coordinator is down? Would
> hinted hand off be used to recover from this scenario?  What options are
> there to synchronise the remote datacentre if the connectivity comes back
> after a couple of days?****
>
> ** **
>
> --
> Thanks
>
>  A Jabbar Azam****
>



-- 
Thanks

 A Jabbar Azam

RE: remote datacentre consistency

Posted by Simon Guindon <si...@jsitelecom.com>.

Here's a good document on how hinted handoff works
http://www.datastax.com/dev/blog/modern-hinted-handoff

I believe if I understand that document correctly that a hinted handoff will get created if the replica is down in the other data center. Also since Cassandra is self-healing, reads will cause read repairs to correct any inconsistent data.

Also Cassandra has an anti-entropy mechanism that actively updates replicas to the newest version using a Merkle tree.

Here's some text on Anti-entropy
http://wiki.apache.org/cassandra/AntiEntropy


From: Jabbar [mailto:ajazam@gmail.com]
Sent: January-08-13 5:34 PM
To: user@cassandra.apache.org
Subject: remote datacentre consistency

I'm a bit confused about how a two datacentre apache cassandra cluster keeps the data consistent.
>From what I understand a client application in datacentre1 contacts a coordinator node which sends the data to the local replicas and it also sends the updates to the remote coordinator in the remote data centre.

Does the local coordinator send the updates asynchronously to the local replicas and the remote coordinator node?
What happens if the bandwidth is severely restricted to the remote datacentre? Do the updates for the remote coordinator keep getting buffered up in the local coordinator?
What happens if the connection to the remote coordinator is down? Would hinted hand off be used to recover from this scenario?  What options are there to synchronise the remote datacentre if the connectivity comes back after a couple of days?

--
Thanks

 A Jabbar Azam