You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Kaushal Shriyan <ka...@gmail.com> on 2022/09/23 14:58:08 UTC

Cassandra data sync time

Hi,

Is there a way to measure cassandra nodes data sync time between DC1 and
DC2? Currently DC1 is the prod datacenter. I am adding DC2 to the new data
center by referring to
https://docs.apigee.com/private-cloud/v4.51.00/adding-data-center?hl=en.

https://docs.apigee.com/release/supported-software
Cassandra version :- 2.1.22

Is there a way to measure the time taken to sync the data in current prod
DC1 (Cassandra Node 1, 2 ,3) and the new DC2 (Cassandra Node 4, 5 ,6)?

Thanks in advance.

Best Regards,

Kaushal

Re: Cassandra data sync time

Posted by Bowen Song via user <us...@cassandra.apache.org>.
It looks like you have replication factor of 3 and total data size of 
1.43 GB per node. That's very small amount of data. Assuming the 
bottleneck is the network, not CPU or disk, and your 50 Mbps bandwidth 
is between each pair of servers across the two DCs (i.e. not the total 
bandwidth available between the DCs), the streaming process itself 
should only take minutes.

On 26/09/2022 12:14, Kaushal Shriyan wrote:
>
>
>
> On Fri, Sep 23, 2022 at 8:39 PM Bowen Song via user 
> <us...@cassandra.apache.org> wrote:
>
>     What's your definition of "sync"? Streaming all the existing data
>     to the new DC? or the time lag between a write request is
>     completed in one DC and the other DC?
>
>     The former can be estimated based on a few facts about your setup
>     (number of nodes, data size, etc.) and some measured data
>     (streaming speed).
>
>     The latter is usually just slightly above the network latency, but
>     can spike up if and when the network between DCs suffer from
>     temporary connectivity issues.
>
>
> Hi Bowen,
>
> Thanks for the quick response. I was referring to streaming all the 
> existing data to the new DC(DC2). We have
>
>
>     On 23/09/2022 15:58, Kaushal Shriyan wrote:
>>     Hi,
>>
>>     Is there a way to measure cassandra nodes data sync time between
>>     DC1 and DC2? Currently DC1 is the prod datacenter. I am adding
>>     DC2 to the new data center by referring to
>>     https://docs.apigee.com/private-cloud/v4.51.00/adding-data-center?hl=en.
>>
>>     https://docs.apigee.com/release/supported-software
>>     Cassandra version :- 2.1.22
>>
>>     Is there a way to measure the time taken to sync the data in
>>     current prod DC1 (Cassandra Node 1, 2 ,3) and the new DC2
>>     (Cassandra Node 4, 5 ,6)?
>>
>>     Thanks in advance.
>>
>>     Best Regards,
>>
>>     Kaushal
>
> Hi Bowen,
>
> Thanks for the quick response. Streaming all the existing data from 
> the current prod DC1 (Cassandra Node 1, 2 ,3) to the new DC2 
> (Cassandra Node 4, 5 ,6). Data bandwidth between DC1 and DC2 is around 
> 50 Mbps. Please let me know if you need any additional details. Thanks 
> in advance.
>
> /opt/apigee/apigee-cassandra/bin/nodetool status
> Datacenter: dc-1
> ================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address        Load       Tokens  Owns (effective)  Host ID       
>                         Rack
> UN  192.198.11.4    1.43 GB    1       100.0% 
>  dbfbd44f-kec5-4f91-bc7d-c31582aec35a  ra-1
> UN  192.198.11.128  1.43 GB    1       100.0% 
>  bc55019c-8ccb-4403-9dc4-481b90a262f6  ra-1
> UN  192.198.11.3    1.43 GB    1       100.0% 
>  4402901c-4562-4f0f-b14a-4eed40a9836c  ra-1
>
> _On Node1_
> du -ch /opt/apigee/data/apigee-cassandra/data
> 1.7G total
>
> _On Node2
> _
> du -ch /opt/apigee/data/apigee-cassandra/data
>
> _On Node3
> _
> du -ch /opt/apigee/data/apigee-cassandra/data
>
> Best Regards,
>
> Kaushal

Re: Cassandra data sync time

Posted by Kaushal Shriyan <ka...@gmail.com>.
On Fri, Sep 23, 2022 at 8:39 PM Bowen Song via user <
user@cassandra.apache.org> wrote:

> What's your definition of "sync"? Streaming all the existing data to the
> new DC? or the time lag between a write request is completed in one DC and
> the other DC?
>
> The former can be estimated based on a few facts about your setup (number
> of nodes, data size, etc.) and some measured data (streaming speed).
>
> The latter is usually just slightly above the network latency, but can
> spike up if and when the network between DCs suffer from temporary
> connectivity issues.
>

Hi Bowen,

Thanks for the quick response. I was referring to streaming all the
existing data to the new DC(DC2). We have




> On 23/09/2022 15:58, Kaushal Shriyan wrote:
>
> Hi,
>
> Is there a way to measure cassandra nodes data sync time between DC1 and
> DC2? Currently DC1 is the prod datacenter. I am adding DC2 to the new data
> center by referring to
> https://docs.apigee.com/private-cloud/v4.51.00/adding-data-center?hl=en.
>
> https://docs.apigee.com/release/supported-software
> Cassandra version :- 2.1.22
>
> Is there a way to measure the time taken to sync the data in current prod
> DC1 (Cassandra Node 1, 2 ,3) and the new DC2 (Cassandra Node 4, 5 ,6)?
>
> Thanks in advance.
>
> Best Regards,
>
> Kaushal
>
> Hi Bowen,

Thanks for the quick response. Streaming all the existing data from the
current prod DC1 (Cassandra Node 1, 2 ,3) to the new DC2 (Cassandra Node 4,
5 ,6). Data bandwidth between DC1 and DC2 is around 50 Mbps. Please let me
know if you need any additional details. Thanks in advance.

/opt/apigee/apigee-cassandra/bin/nodetool status
Datacenter: dc-1
================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load       Tokens  Owns (effective)  Host ID
                  Rack
UN  192.198.11.4    1.43 GB    1       100.0%
 dbfbd44f-kec5-4f91-bc7d-c31582aec35a  ra-1
UN  192.198.11.128  1.43 GB    1       100.0%
 bc55019c-8ccb-4403-9dc4-481b90a262f6  ra-1
UN  192.198.11.3    1.43 GB    1       100.0%
 4402901c-4562-4f0f-b14a-4eed40a9836c  ra-1

*On Node1*
du -ch /opt/apigee/data/apigee-cassandra/data
1.7G total


*On Node2*
du -ch /opt/apigee/data/apigee-cassandra/data


*On Node3*
du -ch /opt/apigee/data/apigee-cassandra/data

Best Regards,

Kaushal

Re: Cassandra data sync time

Posted by Bowen Song via user <us...@cassandra.apache.org>.
What's your definition of "sync"? Streaming all the existing data to the 
new DC? or the time lag between a write request is completed in one DC 
and the other DC?

The former can be estimated based on a few facts about your setup 
(number of nodes, data size, etc.) and some measured data (streaming speed).

The latter is usually just slightly above the network latency, but can 
spike up if and when the network between DCs suffer from temporary 
connectivity issues.

On 23/09/2022 15:58, Kaushal Shriyan wrote:
> Hi,
>
> Is there a way to measure cassandra nodes data sync time between DC1 
> and DC2? Currently DC1 is the prod datacenter. I am adding DC2 to the 
> new data center by referring to 
> https://docs.apigee.com/private-cloud/v4.51.00/adding-data-center?hl=en.
>
> https://docs.apigee.com/release/supported-software
> Cassandra version :- 2.1.22
>
> Is there a way to measure the time taken to sync the data in current 
> prod DC1 (Cassandra Node 1, 2 ,3) and the new DC2 (Cassandra Node 4, 5 
> ,6)?
>
> Thanks in advance.
>
> Best Regards,
>
> Kaushal