You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Akshay Bhardwaj <ak...@gmail.com> on 2018/10/30 20:58:46 UTC

Cassandra | Cross Data Centre Replication Status

Hi Experts,

I previously had 1 Cassandra data centre in AWS Singapore region with 5
nodes, with my keyspace's replication factor as 3 in Network topology.

After this cluster has been running smoothly for 4 months (500 GB of data
on each node's disk), I added 2nd data centre in AWS Mumbai region with yet
again 5 nodes in Network topology.

After updating my keyspace's replication factor to
{"AWS_Sgp":3,"AWS_Mum":3}, my expectation was that the data present in Sgp
region will immediately start replicating on the Mum region's nodes.
However even after 2 weeks I do not see historical data to be replicated,
but new data being written on Sgp region is present in Mum region as well.

Any help or suggestions to debug this issue will be highly appreciated.

Regards
Akshay Bhardwaj
+91-97111-33849

Re: Cassandra | Cross Data Centre Replication Status

Posted by Alexander Dejanovski <al...@thelastpickle.com>.

Akshay,

avoid running repair in that case, it'll take way longer than rebuild and
it will stream data back to your original DC, even between nodes in that
original DC, which is not what you're running after, and could lead to all
sorts of troubles.

Run "nodetool rebuild <original dc>" as recommended by Jon and Surbhi. All
the data in the original DC will be streamed out to the new one, including
the data that was already written since you altered your keyspace
replication settings (so 2 weeks of data). It will then use some extra disk
space until compaction catches up.

Cheers,


On Wed, Oct 31, 2018 at 2:45 PM Kiran mk <co...@gmail.com> wrote:

> Run the repair with -pr option on each node which will repair only the
> parition range.
>
> nodetool repair -pr
> On Wed, Oct 31, 2018 at 7:04 PM Surbhi Gupta <su...@gmail.com>
> wrote:
> >
> > Nodetool repair will take way more time than nodetool rebuild.
> > How much data u have in your original data center?
> > Repair should be run to make the data consistent in case of node down
> more than hintedhandoff period and dropped mutations.
> > But as a thumb rule ,generally we run repair using opscenter (if using
> Datastax) most of the times.
> >
> > So in your case run “nodetool rebuild <original data enter>” on all the
> nodes in new data center.
> > For making the rebuild process fast, increase three parameters,
> compaction throughput , stream throughput and interdcstream  throughput.
> >
> > Thanks
> > Surbhi
> > On Tue, Oct 30, 2018 at 11:29 PM Akshay Bhardwaj <
> akshay.bhardwaj1988@gmail.com> wrote:
> >>
> >> Hi Jonathan,
> >>
> >> That makes sense. Thank you for the explanation.
> >>
> >> Another quick question, as the cluster is still operative and the data
> for the past 2 weeks (since updating replication factor) is present in both
> the data centres, should I run "nodetool rebuild" or "nodetool repair"?
> >>
> >> I read that nodetool rebuild is faster and is useful till the new data
> centre is empty and no partition keys are present. So when is the good time
> to use either of the commands and what impact can it have on the data
> centre operations?
> >>
> >> Thanks and Regards
> >>
> >> Akshay Bhardwaj
> >> +91-97111-33849 <+91%2097111%2033849>
> >>
> >>
> >> On Wed, Oct 31, 2018 at 2:34 AM Jonathan Haddad <jo...@jonhaddad.com>
> wrote:
> >>>
> >>> You need to run "nodetool rebuild -- <existing-dc-name>" on each node
> in the new DC to get the old data to replicate.  It doesn't do it
> automatically because Cassandra has no way of knowing if you're done adding
> nodes and if it were to migrate automatically, it could cause a lot of
> problems. Imagine streaming 100 nodes data to 3 nodes in the new DC, not
> fun.
> >>>
> >>> On Tue, Oct 30, 2018 at 1:59 PM Akshay Bhardwaj <
> akshay.bhardwaj1988@gmail.com> wrote:
> >>>>
> >>>> Hi Experts,
> >>>>
> >>>> I previously had 1 Cassandra data centre in AWS Singapore region with
> 5 nodes, with my keyspace's replication factor as 3 in Network topology.
> >>>>
> >>>> After this cluster has been running smoothly for 4 months (500 GB of
> data on each node's disk), I added 2nd data centre in AWS Mumbai region
> with yet again 5 nodes in Network topology.
> >>>>
> >>>> After updating my keyspace's replication factor to
> {"AWS_Sgp":3,"AWS_Mum":3}, my expectation was that the data present in Sgp
> region will immediately start replicating on the Mum region's nodes.
> However even after 2 weeks I do not see historical data to be replicated,
> but new data being written on Sgp region is present in Mum region as well.
> >>>>
> >>>> Any help or suggestions to debug this issue will be highly
> appreciated.
> >>>>
> >>>> Regards
> >>>> Akshay Bhardwaj
> >>>> +91-97111-33849 <+91%2097111%2033849>
> >>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> Jon Haddad
> >>> http://www.rustyrazorblade.com
> >>> twitter: rustyrazorblade
> >>>
> >>>
> >>
> >>
>
>
> --
> Best Regards,
> Kiran.M.K.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: user-help@cassandra.apache.org
>
> --
-----------------
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: Cassandra | Cross Data Centre Replication Status

Posted by Surbhi Gupta <su...@gmail.com>.

Repair will take way more time then rebuild.

On Wed, Oct 31, 2018 at 6:45 AM Kiran mk <co...@gmail.com> wrote:

> Run the repair with -pr option on each node which will repair only the
>
> parition range.
>
>
>
> nodetool repair -pr
>
> On Wed, Oct 31, 2018 at 7:04 PM Surbhi Gupta <su...@gmail.com>
> wrote:
>
> >
>
> > Nodetool repair will take way more time than nodetool rebuild.
>
> > How much data u have in your original data center?
>
> > Repair should be run to make the data consistent in case of node down
> more than hintedhandoff period and dropped mutations.
>
> > But as a thumb rule ,generally we run repair using opscenter (if using
> Datastax) most of the times.
>
> >
>
> > So in your case run “nodetool rebuild <original data enter>” on all the
> nodes in new data center.
>
> > For making the rebuild process fast, increase three parameters,
> compaction throughput , stream throughput and interdcstream  throughput.
>
> >
>
> > Thanks
>
> > Surbhi
>
> > On Tue, Oct 30, 2018 at 11:29 PM Akshay Bhardwaj <
> akshay.bhardwaj1988@gmail.com> wrote:
>
> >>
>
> >> Hi Jonathan,
>
> >>
>
> >> That makes sense. Thank you for the explanation.
>
> >>
>
> >> Another quick question, as the cluster is still operative and the data
> for the past 2 weeks (since updating replication factor) is present in both
> the data centres, should I run "nodetool rebuild" or "nodetool repair"?
>
> >>
>
> >> I read that nodetool rebuild is faster and is useful till the new data
> centre is empty and no partition keys are present. So when is the good time
> to use either of the commands and what impact can it have on the data
> centre operations?
>
> >>
>
> >> Thanks and Regards
>
> >>
>
> >> Akshay Bhardwaj
>
> >> +91-97111-33849
>
> >>
>
> >>
>
> >> On Wed, Oct 31, 2018 at 2:34 AM Jonathan Haddad <jo...@jonhaddad.com>
> wrote:
>
> >>>
>
> >>> You need to run "nodetool rebuild -- <existing-dc-name>" on each node
> in the new DC to get the old data to replicate.  It doesn't do it
> automatically because Cassandra has no way of knowing if you're done adding
> nodes and if it were to migrate automatically, it could cause a lot of
> problems. Imagine streaming 100 nodes data to 3 nodes in the new DC, not
> fun.
>
> >>>
>
> >>> On Tue, Oct 30, 2018 at 1:59 PM Akshay Bhardwaj <
> akshay.bhardwaj1988@gmail.com> wrote:
>
> >>>>
>
> >>>> Hi Experts,
>
> >>>>
>
> >>>> I previously had 1 Cassandra data centre in AWS Singapore region with
> 5 nodes, with my keyspace's replication factor as 3 in Network topology.
>
> >>>>
>
> >>>> After this cluster has been running smoothly for 4 months (500 GB of
> data on each node's disk), I added 2nd data centre in AWS Mumbai region
> with yet again 5 nodes in Network topology.
>
> >>>>
>
> >>>> After updating my keyspace's replication factor to
> {"AWS_Sgp":3,"AWS_Mum":3}, my expectation was that the data present in Sgp
> region will immediately start replicating on the Mum region's nodes.
> However even after 2 weeks I do not see historical data to be replicated,
> but new data being written on Sgp region is present in Mum region as well.
>
> >>>>
>
> >>>> Any help or suggestions to debug this issue will be highly
> appreciated.
>
> >>>>
>
> >>>> Regards
>
> >>>> Akshay Bhardwaj
>
> >>>> +91-97111-33849
>
> >>>>
>
> >>>>
>
> >>>
>
> >>>
>
> >>> --
>
> >>> Jon Haddad
>
> >>> http://www.rustyrazorblade.com
>
> >>> twitter: rustyrazorblade
>
> >>>
>
> >>>
>
> >>
>
> >>
>
>
>
>
>
> --
>
> Best Regards,
>
> Kiran.M.K.
>
>
>
> ---------------------------------------------------------------------
>
> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>
> For additional commands, e-mail: user-help@cassandra.apache.org
>
>
>
>

Re: Cassandra | Cross Data Centre Replication Status

Posted by Kiran mk <co...@gmail.com>.

Run the repair with -pr option on each node which will repair only the
parition range.

nodetool repair -pr
On Wed, Oct 31, 2018 at 7:04 PM Surbhi Gupta <su...@gmail.com> wrote:
>
> Nodetool repair will take way more time than nodetool rebuild.
> How much data u have in your original data center?
> Repair should be run to make the data consistent in case of node down more than hintedhandoff period and dropped mutations.
> But as a thumb rule ,generally we run repair using opscenter (if using Datastax) most of the times.
>
> So in your case run “nodetool rebuild <original data enter>” on all the nodes in new data center.
> For making the rebuild process fast, increase three parameters, compaction throughput , stream throughput and interdcstream  throughput.
>
> Thanks
> Surbhi
> On Tue, Oct 30, 2018 at 11:29 PM Akshay Bhardwaj <ak...@gmail.com> wrote:
>>
>> Hi Jonathan,
>>
>> That makes sense. Thank you for the explanation.
>>
>> Another quick question, as the cluster is still operative and the data for the past 2 weeks (since updating replication factor) is present in both the data centres, should I run "nodetool rebuild" or "nodetool repair"?
>>
>> I read that nodetool rebuild is faster and is useful till the new data centre is empty and no partition keys are present. So when is the good time to use either of the commands and what impact can it have on the data centre operations?
>>
>> Thanks and Regards
>>
>> Akshay Bhardwaj
>> +91-97111-33849
>>
>>
>> On Wed, Oct 31, 2018 at 2:34 AM Jonathan Haddad <jo...@jonhaddad.com> wrote:
>>>
>>> You need to run "nodetool rebuild -- <existing-dc-name>" on each node in the new DC to get the old data to replicate.  It doesn't do it automatically because Cassandra has no way of knowing if you're done adding nodes and if it were to migrate automatically, it could cause a lot of problems. Imagine streaming 100 nodes data to 3 nodes in the new DC, not fun.
>>>
>>> On Tue, Oct 30, 2018 at 1:59 PM Akshay Bhardwaj <ak...@gmail.com> wrote:
>>>>
>>>> Hi Experts,
>>>>
>>>> I previously had 1 Cassandra data centre in AWS Singapore region with 5 nodes, with my keyspace's replication factor as 3 in Network topology.
>>>>
>>>> After this cluster has been running smoothly for 4 months (500 GB of data on each node's disk), I added 2nd data centre in AWS Mumbai region with yet again 5 nodes in Network topology.
>>>>
>>>> After updating my keyspace's replication factor to {"AWS_Sgp":3,"AWS_Mum":3}, my expectation was that the data present in Sgp region will immediately start replicating on the Mum region's nodes. However even after 2 weeks I do not see historical data to be replicated, but new data being written on Sgp region is present in Mum region as well.
>>>>
>>>> Any help or suggestions to debug this issue will be highly appreciated.
>>>>
>>>> Regards
>>>> Akshay Bhardwaj
>>>> +91-97111-33849
>>>>
>>>>
>>>
>>>
>>> --
>>> Jon Haddad
>>> http://www.rustyrazorblade.com
>>> twitter: rustyrazorblade
>>>
>>>
>>
>>


-- 
Best Regards,
Kiran.M.K.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org

Re: Cassandra | Cross Data Centre Replication Status

Posted by Surbhi Gupta <su...@gmail.com>.

Nodetool repair will take way more time than nodetool rebuild.
How much data u have in your original data center?
Repair should be run to make the data consistent in case of node down more
than hintedhandoff period and dropped mutations.
But as a thumb rule ,generally we run repair using opscenter (if using
Datastax) most of the times.

So in your case run “nodetool rebuild <original data enter>” on all the
nodes in new data center.
For making the rebuild process fast, increase three parameters, compaction
throughput , stream throughput and interdcstream  throughput.

Thanks
Surbhi
On Tue, Oct 30, 2018 at 11:29 PM Akshay Bhardwaj <
akshay.bhardwaj1988@gmail.com> wrote:

> Hi Jonathan,
>
> That makes sense. Thank you for the explanation.
>
> Another quick question, as the cluster is still operative and the data for
> the past 2 weeks (since updating replication factor) is present in both the
> data centres, should I run "nodetool rebuild" or "nodetool repair"?
>
> I read that nodetool rebuild is faster and is useful till the new data
> centre is empty and no partition keys are present. So when is the good time
> to use either of the commands and what impact can it have on the data
> centre operations?
>
> Thanks and Regards
>
> Akshay Bhardwaj
> +91-97111-33849
>
>
> On Wed, Oct 31, 2018 at 2:34 AM Jonathan Haddad <jo...@jonhaddad.com> wrote:
>
>> You need to run "nodetool rebuild -- <existing-dc-name>" on each node in
>> the new DC to get the old data to replicate.  It doesn't do it
>> automatically because Cassandra has no way of knowing if you're done adding
>> nodes and if it were to migrate automatically, it could cause a lot of
>> problems. Imagine streaming 100 nodes data to 3 nodes in the new DC, not
>> fun.
>>
>> On Tue, Oct 30, 2018 at 1:59 PM Akshay Bhardwaj <
>> akshay.bhardwaj1988@gmail.com> wrote:
>>
>>> Hi Experts,
>>>
>>> I previously had 1 Cassandra data centre in AWS Singapore region with 5
>>> nodes, with my keyspace's replication factor as 3 in Network topology.
>>>
>>> After this cluster has been running smoothly for 4 months (500 GB of
>>> data on each node's disk), I added 2nd data centre in AWS Mumbai region
>>> with yet again 5 nodes in Network topology.
>>>
>>> After updating my keyspace's replication factor to
>>> {"AWS_Sgp":3,"AWS_Mum":3}, my expectation was that the data present in Sgp
>>> region will immediately start replicating on the Mum region's nodes.
>>> However even after 2 weeks I do not see historical data to be replicated,
>>> but new data being written on Sgp region is present in Mum region as well.
>>>
>>> Any help or suggestions to debug this issue will be highly appreciated.
>>>
>>> Regards
>>> Akshay Bhardwaj
>>> +91-97111-33849
>>>
>>>
>>>
>>
>> --
>> Jon Haddad
>> http://www.rustyrazorblade.com
>> twitter: rustyrazorblade
>>
>>
>>
>
>

Re: Cassandra | Cross Data Centre Replication Status

Posted by Akshay Bhardwaj <ak...@gmail.com>.

Hi Jonathan,

That makes sense. Thank you for the explanation.

Another quick question, as the cluster is still operative and the data for
the past 2 weeks (since updating replication factor) is present in both the
data centres, should I run "nodetool rebuild" or "nodetool repair"?

I read that nodetool rebuild is faster and is useful till the new data
centre is empty and no partition keys are present. So when is the good time
to use either of the commands and what impact can it have on the data
centre operations?

Thanks and Regards
Akshay Bhardwaj
+91-97111-33849


On Wed, Oct 31, 2018 at 2:34 AM Jonathan Haddad <jo...@jonhaddad.com> wrote:

> You need to run "nodetool rebuild -- <existing-dc-name>" on each node in
> the new DC to get the old data to replicate.  It doesn't do it
> automatically because Cassandra has no way of knowing if you're done adding
> nodes and if it were to migrate automatically, it could cause a lot of
> problems. Imagine streaming 100 nodes data to 3 nodes in the new DC, not
> fun.
>
> On Tue, Oct 30, 2018 at 1:59 PM Akshay Bhardwaj <
> akshay.bhardwaj1988@gmail.com> wrote:
>
>> Hi Experts,
>>
>> I previously had 1 Cassandra data centre in AWS Singapore region with 5
>> nodes, with my keyspace's replication factor as 3 in Network topology.
>>
>> After this cluster has been running smoothly for 4 months (500 GB of data
>> on each node's disk), I added 2nd data centre in AWS Mumbai region with yet
>> again 5 nodes in Network topology.
>>
>> After updating my keyspace's replication factor to
>> {"AWS_Sgp":3,"AWS_Mum":3}, my expectation was that the data present in Sgp
>> region will immediately start replicating on the Mum region's nodes.
>> However even after 2 weeks I do not see historical data to be replicated,
>> but new data being written on Sgp region is present in Mum region as well.
>>
>> Any help or suggestions to debug this issue will be highly appreciated.
>>
>> Regards
>> Akshay Bhardwaj
>> +91-97111-33849
>>
>
>
> --
> Jon Haddad
> http://www.rustyrazorblade.com
> twitter: rustyrazorblade
>

Re: Cassandra | Cross Data Centre Replication Status

Posted by Jonathan Haddad <jo...@jonhaddad.com>.

You need to run "nodetool rebuild -- <existing-dc-name>" on each node in
the new DC to get the old data to replicate.  It doesn't do it
automatically because Cassandra has no way of knowing if you're done adding
nodes and if it were to migrate automatically, it could cause a lot of
problems. Imagine streaming 100 nodes data to 3 nodes in the new DC, not
fun.

On Tue, Oct 30, 2018 at 1:59 PM Akshay Bhardwaj <
akshay.bhardwaj1988@gmail.com> wrote:

> Hi Experts,
>
> I previously had 1 Cassandra data centre in AWS Singapore region with 5
> nodes, with my keyspace's replication factor as 3 in Network topology.
>
> After this cluster has been running smoothly for 4 months (500 GB of data
> on each node's disk), I added 2nd data centre in AWS Mumbai region with yet
> again 5 nodes in Network topology.
>
> After updating my keyspace's replication factor to
> {"AWS_Sgp":3,"AWS_Mum":3}, my expectation was that the data present in Sgp
> region will immediately start replicating on the Mum region's nodes.
> However even after 2 weeks I do not see historical data to be replicated,
> but new data being written on Sgp region is present in Mum region as well.
>
> Any help or suggestions to debug this issue will be highly appreciated.
>
> Regards
> Akshay Bhardwaj
> +91-97111-33849
>


-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade