You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Vijay Patil <vi...@gmail.com> on 2016/04/05 08:26:32 UTC

cross DC data sync starts without rebuilding nodes on new DC

Hi,

I have configured new DC as per instructions at below link.
https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html

After completing step 7.a (which is altering keyspace with desired
replication factor for new DC) data automatically starts syncing from
existing DC to new DC, which is not expected because auto_bootstrap is
false on all nodes (existing as well as new DC).

Any idea why it's happening?
If this is the desired behaviour then what's the purpose of rebuilding each
node on new DC (step 7.b)?

Cassandra version is 2.0.17 on all nodes in both DC's and I am using
GossipingPropertyFileSnitch.

Regards,
Vijay

RE: cross DC data sync starts without rebuilding nodes on new DC

Posted by SE...@homedepot.com.
What do you mean by “automatically starts syncing?” Are you seeing streaming of existing data or just the addition of new, incoming data? Do you have repairs running as part of your automated maintenance, perhaps?

I would expect that new, incoming data would be replicated to the new DC after the “alter table.” But I would not expect any streaming of existing data unless repairs are running somewhere.

Also, does your nodetool status reflect the new DC?

Sean Durity

From: Vijay Patil [mailto:vijay2110.tech@gmail.com]
Sent: Tuesday, April 05, 2016 2:27 AM
To: user@cassandra.apache.org
Subject: cross DC data sync starts without rebuilding nodes on new DC

Hi,

I have configured new DC as per instructions at below link.
https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html

After completing step 7.a (which is altering keyspace with desired replication factor for new DC) data automatically starts syncing from existing DC to new DC, which is not expected because auto_bootstrap is false on all nodes (existing as well as new DC).

Any idea why it's happening?
If this is the desired behaviour then what's the purpose of rebuilding each node on new DC (step 7.b)?

Cassandra version is 2.0.17 on all nodes in both DC's and I am using GossipingPropertyFileSnitch.

Regards,
Vijay

________________________________

The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.

Re: cross DC data sync starts without rebuilding nodes on new DC

Posted by Vijay Patil <vi...@gmail.com>.
Thanks Alain and Sean, your detailed explanation answers my question.

Yes, nodetool status reflects new DC and nodetool netstats says not "No
Streams".
My all writes going to old DC with local_quorum. Yes this new data might be
getting synced into new DC (repair was not running anywhere).
I will proceed with rebuilding nodes on new DC.

Thanks,
Vijay

On 5 April 2016 at 18:56, Alain RODRIGUEZ <ar...@gmail.com> wrote:

> Hi Vijay.
>
> After completing step 7.a (which is altering keyspace with desired
>> replication factor for new DC) data automatically starts syncing from
>> existing DC to new DC
>>
>
> My guess: what you are seeing is not data syncing. Well it is, but not old
> data being streamed but new writes being replicated. As soon as you set the
> RF for the new DC, it starts accepting writes.
>
> Some background:
> Using a Local_X consistency level means the operation to copy data to all
> the DC won't happen, it means coordinator won't wait for ack from other DC
> nodes, but write should reach all the DC set in the keyspace configuration.
> So as soon as you say I want X copies of the data on the new Datacenter,
> new data start to be replicated there.
>
> To check:
>
> Are you writing in your original DC?
> Is the output of 'nodetool netstats' saying 'No streams' as I expect?
>
> When rebuilding run this command again and you should see streams.
>
> Any idea why it's happening?
>> If this is the desired behaviour then what's the purpose of rebuilding
>> each node on new DC (step 7.b)?
>>
>
> So basically, the rebuild allows the new cluster to have the *old* /
> *existing* data streamed from an other DC. We use rebuild instead of
> auto_bootstrap to avoid nodes trying to stream data as soon as they are
> added to the new DC because we want to add *all* the nodes, to have ranges
> distributed evenly before starting streaming to stream just the correct
> amount of data from the DC of our choice.
>
> C*heers,
> -----------------------
> Alain Rodriguez - alain@thelastpickle.com
> France
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
>
> 2016-04-05 8:26 GMT+02:00 Vijay Patil <vi...@gmail.com>:
>
>> Hi,
>>
>> I have configured new DC as per instructions at below link.
>>
>> https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html
>>
>> After completing step 7.a (which is altering keyspace with desired
>> replication factor for new DC) data automatically starts syncing from
>> existing DC to new DC, which is not expected because auto_bootstrap is
>> false on all nodes (existing as well as new DC).
>>
>> Any idea why it's happening?
>> If this is the desired behaviour then what's the purpose of rebuilding
>> each node on new DC (step 7.b)?
>>
>> Cassandra version is 2.0.17 on all nodes in both DC's and I am using
>> GossipingPropertyFileSnitch.
>>
>> Regards,
>> Vijay
>>
>
>

Re: cross DC data sync starts without rebuilding nodes on new DC

Posted by Alain RODRIGUEZ <ar...@gmail.com>.
Hi Vijay.

After completing step 7.a (which is altering keyspace with desired
> replication factor for new DC) data automatically starts syncing from
> existing DC to new DC
>

My guess: what you are seeing is not data syncing. Well it is, but not old
data being streamed but new writes being replicated. As soon as you set the
RF for the new DC, it starts accepting writes.

Some background:
Using a Local_X consistency level means the operation to copy data to all
the DC won't happen, it means coordinator won't wait for ack from other DC
nodes, but write should reach all the DC set in the keyspace configuration.
So as soon as you say I want X copies of the data on the new Datacenter,
new data start to be replicated there.

To check:

Are you writing in your original DC?
Is the output of 'nodetool netstats' saying 'No streams' as I expect?

When rebuilding run this command again and you should see streams.

Any idea why it's happening?
> If this is the desired behaviour then what's the purpose of rebuilding
> each node on new DC (step 7.b)?
>

So basically, the rebuild allows the new cluster to have the *old* /
*existing* data streamed from an other DC. We use rebuild instead of
auto_bootstrap to avoid nodes trying to stream data as soon as they are
added to the new DC because we want to add *all* the nodes, to have ranges
distributed evenly before starting streaming to stream just the correct
amount of data from the DC of our choice.

C*heers,
-----------------------
Alain Rodriguez - alain@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com


2016-04-05 8:26 GMT+02:00 Vijay Patil <vi...@gmail.com>:

> Hi,
>
> I have configured new DC as per instructions at below link.
>
> https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html
>
> After completing step 7.a (which is altering keyspace with desired
> replication factor for new DC) data automatically starts syncing from
> existing DC to new DC, which is not expected because auto_bootstrap is
> false on all nodes (existing as well as new DC).
>
> Any idea why it's happening?
> If this is the desired behaviour then what's the purpose of rebuilding
> each node on new DC (step 7.b)?
>
> Cassandra version is 2.0.17 on all nodes in both DC's and I am using
> GossipingPropertyFileSnitch.
>
> Regards,
> Vijay
>