You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Kaushal Shriyan <ka...@gmail.com> on 2022/09/23 14:58:08 UTC
Cassandra data sync time
Hi,
Is there a way to measure cassandra nodes data sync time between DC1 and
DC2? Currently DC1 is the prod datacenter. I am adding DC2 to the new data
center by referring to
https://docs.apigee.com/private-cloud/v4.51.00/adding-data-center?hl=en.
https://docs.apigee.com/release/supported-software
Cassandra version :- 2.1.22
Is there a way to measure the time taken to sync the data in current prod
DC1 (Cassandra Node 1, 2 ,3) and the new DC2 (Cassandra Node 4, 5 ,6)?
Thanks in advance.
Best Regards,
Kaushal
Re: Cassandra data sync time
Posted by Bowen Song via user <us...@cassandra.apache.org>.
It looks like you have replication factor of 3 and total data size of
1.43 GB per node. That's very small amount of data. Assuming the
bottleneck is the network, not CPU or disk, and your 50 Mbps bandwidth
is between each pair of servers across the two DCs (i.e. not the total
bandwidth available between the DCs), the streaming process itself
should only take minutes.
On 26/09/2022 12:14, Kaushal Shriyan wrote:
>
>
>
> On Fri, Sep 23, 2022 at 8:39 PM Bowen Song via user
> <us...@cassandra.apache.org> wrote:
>
> What's your definition of "sync"? Streaming all the existing data
> to the new DC? or the time lag between a write request is
> completed in one DC and the other DC?
>
> The former can be estimated based on a few facts about your setup
> (number of nodes, data size, etc.) and some measured data
> (streaming speed).
>
> The latter is usually just slightly above the network latency, but
> can spike up if and when the network between DCs suffer from
> temporary connectivity issues.
>
>
> Hi Bowen,
>
> Thanks for the quick response. I was referring to streaming all the
> existing data to the new DC(DC2). We have
>
>
> On 23/09/2022 15:58, Kaushal Shriyan wrote:
>> Hi,
>>
>> Is there a way to measure cassandra nodes data sync time between
>> DC1 and DC2? Currently DC1 is the prod datacenter. I am adding
>> DC2 to the new data center by referring to
>> https://docs.apigee.com/private-cloud/v4.51.00/adding-data-center?hl=en.
>>
>> https://docs.apigee.com/release/supported-software
>> Cassandra version :- 2.1.22
>>
>> Is there a way to measure the time taken to sync the data in
>> current prod DC1 (Cassandra Node 1, 2 ,3) and the new DC2
>> (Cassandra Node 4, 5 ,6)?
>>
>> Thanks in advance.
>>
>> Best Regards,
>>
>> Kaushal
>
> Hi Bowen,
>
> Thanks for the quick response. Streaming all the existing data from
> the current prod DC1 (Cassandra Node 1, 2 ,3) to the new DC2
> (Cassandra Node 4, 5 ,6). Data bandwidth between DC1 and DC2 is around
> 50 Mbps. Please let me know if you need any additional details. Thanks
> in advance.
>
> /opt/apigee/apigee-cassandra/bin/nodetool status
> Datacenter: dc-1
> ================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> -- Address Load Tokens Owns (effective) Host ID
> Rack
> UN 192.198.11.4 1.43 GB 1 100.0%
> dbfbd44f-kec5-4f91-bc7d-c31582aec35a ra-1
> UN 192.198.11.128 1.43 GB 1 100.0%
> bc55019c-8ccb-4403-9dc4-481b90a262f6 ra-1
> UN 192.198.11.3 1.43 GB 1 100.0%
> 4402901c-4562-4f0f-b14a-4eed40a9836c ra-1
>
> _On Node1_
> du -ch /opt/apigee/data/apigee-cassandra/data
> 1.7G total
>
> _On Node2
> _
> du -ch /opt/apigee/data/apigee-cassandra/data
>
> _On Node3
> _
> du -ch /opt/apigee/data/apigee-cassandra/data
>
> Best Regards,
>
> Kaushal
Re: Cassandra data sync time
Posted by Kaushal Shriyan <ka...@gmail.com>.
On Fri, Sep 23, 2022 at 8:39 PM Bowen Song via user <
user@cassandra.apache.org> wrote:
> What's your definition of "sync"? Streaming all the existing data to the
> new DC? or the time lag between a write request is completed in one DC and
> the other DC?
>
> The former can be estimated based on a few facts about your setup (number
> of nodes, data size, etc.) and some measured data (streaming speed).
>
> The latter is usually just slightly above the network latency, but can
> spike up if and when the network between DCs suffer from temporary
> connectivity issues.
>
Hi Bowen,
Thanks for the quick response. I was referring to streaming all the
existing data to the new DC(DC2). We have
> On 23/09/2022 15:58, Kaushal Shriyan wrote:
>
> Hi,
>
> Is there a way to measure cassandra nodes data sync time between DC1 and
> DC2? Currently DC1 is the prod datacenter. I am adding DC2 to the new data
> center by referring to
> https://docs.apigee.com/private-cloud/v4.51.00/adding-data-center?hl=en.
>
> https://docs.apigee.com/release/supported-software
> Cassandra version :- 2.1.22
>
> Is there a way to measure the time taken to sync the data in current prod
> DC1 (Cassandra Node 1, 2 ,3) and the new DC2 (Cassandra Node 4, 5 ,6)?
>
> Thanks in advance.
>
> Best Regards,
>
> Kaushal
>
> Hi Bowen,
Thanks for the quick response. Streaming all the existing data from the
current prod DC1 (Cassandra Node 1, 2 ,3) to the new DC2 (Cassandra Node 4,
5 ,6). Data bandwidth between DC1 and DC2 is around 50 Mbps. Please let me
know if you need any additional details. Thanks in advance.
/opt/apigee/apigee-cassandra/bin/nodetool status
Datacenter: dc-1
================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID
Rack
UN 192.198.11.4 1.43 GB 1 100.0%
dbfbd44f-kec5-4f91-bc7d-c31582aec35a ra-1
UN 192.198.11.128 1.43 GB 1 100.0%
bc55019c-8ccb-4403-9dc4-481b90a262f6 ra-1
UN 192.198.11.3 1.43 GB 1 100.0%
4402901c-4562-4f0f-b14a-4eed40a9836c ra-1
*On Node1*
du -ch /opt/apigee/data/apigee-cassandra/data
1.7G total
*On Node2*
du -ch /opt/apigee/data/apigee-cassandra/data
*On Node3*
du -ch /opt/apigee/data/apigee-cassandra/data
Best Regards,
Kaushal
Re: Cassandra data sync time
Posted by Bowen Song via user <us...@cassandra.apache.org>.
What's your definition of "sync"? Streaming all the existing data to the
new DC? or the time lag between a write request is completed in one DC
and the other DC?
The former can be estimated based on a few facts about your setup
(number of nodes, data size, etc.) and some measured data (streaming speed).
The latter is usually just slightly above the network latency, but can
spike up if and when the network between DCs suffer from temporary
connectivity issues.
On 23/09/2022 15:58, Kaushal Shriyan wrote:
> Hi,
>
> Is there a way to measure cassandra nodes data sync time between DC1
> and DC2? Currently DC1 is the prod datacenter. I am adding DC2 to the
> new data center by referring to
> https://docs.apigee.com/private-cloud/v4.51.00/adding-data-center?hl=en.
>
> https://docs.apigee.com/release/supported-software
> Cassandra version :- 2.1.22
>
> Is there a way to measure the time taken to sync the data in current
> prod DC1 (Cassandra Node 1, 2 ,3) and the new DC2 (Cassandra Node 4, 5
> ,6)?
>
> Thanks in advance.
>
> Best Regards,
>
> Kaushal