You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Stone Fang <cn...@gmail.com> on 2016/07/21 09:57:47 UTC

multi datacenter improvement

Hi All,
I am thinking about the issue of cassandra multi datacenter.
open a ticket to track this.welcome to your point.
https://issues.apache.org/jira/browse/CASSANDRA-12257

*Environment*
active-active cassandra datacenter.
set write consistency level=local_quorum to get a high resquest response.

*Concern*
we dont know the time of the data arrive other datacenter.
we dont know the size of data that need to be transferred to other
datacenter.

*Scenario*
one project need to collect information from sensors,which in different
region.

2 datacenter,DC1,DC2.sensor1,sensor2 in DC1 region.sensor3,sensor4 in DC2.

one client in DC1,pull data every 10 minutes.
1.sensor3 in DC2 write a record to DC2 at 8:59:55 ,and arrived DC1 at
9:00:05
2.client in DC1 pull data at 9:00,it should get the record,but it cannot as
the
record have not arrive DC1.
3.then client in DC1 will pull data at 9:10.it also can not get the record
as it will pull data from 9:00---9:10,but the record created on 8:59:55

so we will miss the record.wee need measure the latency so we can look back
the data.

*Thought*
1.we can get the latency with ping or other monitor tool,but it can not
represent the latency of cassandra data to be transferred from one dc to
another dc.


2.we can measure the latency between dc.but there is a accumulated value
from node started,it cannot represent the latency now.
https://issues.apache.org/jira/browse/CASSANDRA-11569

3.Cassandra may need to insert a record into a system table periodly.

so we can clearly know when the data arrived .

thanks in advance

stone