You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Anuj Wadehra <an...@yahoo.co.in> on 2015/06/24 19:43:13 UTC

Adding Nodes With Inconsistent Data

Hi,


We faced a scenario where we lost little data after adding 2 nodes in the cluster. There were intermittent dropped mutations in the cluster. Need to verify my understanding how this may have happened to do Root Cause Analysis:


Scenario: 3 nodes, RF=3, Read / Write CL= Quorum


1. Due to overloaded cluster, some writes just happened on 2 nodes: node 1 & node 2 whike asynchronous mutations dropped on node 3.

So say key K with Token T was not written to 3.


2. I added node 4 and suppose as per newly calculated ranges, now token T is supposed to have replicas on node 1, node 3, and node 4. Unfortunately node 4 started bootstrapping from node 3 where key K was missing.


3. After 2 min gap recommended, I added node 5 and as per new token distribution suppose token T now is suppossed to have replicas on node 3, node 4 and node 5. Again node 5 bootstrapped from node 3 where data was misssing.


So now key K is lost and thats how we list very few rows.


Moreover, in step 1 situation could be worse. we can also have a scenario where some writes just happened on one of three replicas and cassandra chooses  replicas where this data is missing for streaming ranges to 2 new nodes.


Am I making sense?


We are using C* 2.0.3.


Thanks

Anuj




Sent from Yahoo Mail on Android

Re: Adding Nodes With Inconsistent Data

Posted by Robert Coli <rc...@eventbrite.com>.

On Sun, Jun 28, 2015 at 10:46 AM, Anuj Wadehra <an...@yahoo.co.in>
wrote:

> Thanks Jake!! But I think most people have 2.0.x in Production right now
> as 2.1.6 is very recently declared Production Ready. I think the bug is too
> important to be left open in 2.0.x as it leads to data loss. Should I open
> JIRA?
>

Sorry, but this is the bug that provides very strong support to the Coli
Conjecture :

"Anyone who is storing their data in Cassandra[1] probably doesn't actually
care about consistency, even if they think they do."

This bug has existed in every version of Cassandra ever. People have run
these versions of Cassandra in production, with many many of them not
regularly running repair once every gc_grace_seconds.

Few to none of them have ever detected any problem which resulted from the
lack of consistency which must occur in an unrepaired cluster with
CASSANDRA-2434. I am forced to conclude from this data that resolving
CASSANDRA-2434 is actually not that important.

Yes, this is a surprising and somewhat troubling formulation. But if you
think about it, from the perspective of the application, a failed
non-idempotent write which nonetheless is persisted is also fatal to
consistency...

=Rob
 [1] Cassandra is cited, but this is likely more of a statement about
distribution-appropriate applications, not Cassandra specifically...

Re: Adding Nodes With Inconsistent Data

Posted by Anuj Wadehra <an...@yahoo.co.in>.

Thanks Jake!! But I think most people have 2.0.x in Production right now as 2.1.6 is very recently declared Production Ready. I think the bug is too important to be left open in 2.0.x as it leads to data loss. Should I open JIRA?

ThanksAnuj Wadehra

On Thursday, 25 June 2015 2:47 AM, Jake Luciani <ja...@gmail.com> wrote:

This is no longer an issue in 2.1. https://issues.apache.org/jira/browse/CASSANDRA-2434
We now make sure the replica we bootstrap from is the one that will no longer own that range
On Wed, Jun 24, 2015 at 4:58 PM, Alain RODRIGUEZ <ar...@gmail.com> wrote:

It looks to me that can indeed happen theoretically (I might be wrong).
However,
- Hinted Handoff tends to remove this issue, if this is big worry, you might want to make sure HH are enabled and well tuned- Read Repairs (synchronous or not) might have mitigate things also, if you read fresh data. You can set this to higher values.- After an outage, you should always run a nodetool repair on the node that went done - following the best practices, or because you understand the reasons - or just trust HH if it is enough to you.
So I would say that you can always "shoot yourself in your foot", whatever you do, yet following best practices or understanding the internals is the key imho.
I would say it is a good question though.
Alain.

2015-06-24 19:43 GMT+02:00 Anuj Wadehra <an...@yahoo.co.in>:

| Hi,
We faced a scenario where we lost little data after adding 2 nodes in the cluster. There were intermittent dropped mutations in the cluster. Need to verify my understanding how this may have happened to do Root Cause Analysis:
Scenario: 3 nodes, RF=3, Read / Write CL= Quorum
1. Due to overloaded cluster, some writes just happened on 2 nodes: node 1 & node 2 whike asynchronous mutations dropped on node 3.So say key K with Token T was not written to 3.
2. I added node 4 and suppose as per newly calculated ranges, now token T is supposed to have replicas on node 1, node 3, and node 4. Unfortunately node 4 started bootstrapping from node 3 where key K was missing.
3. After 2 min gap recommended, I added node 5 and as per new token distribution suppose token T now is suppossed to have replicas on node 3, node 4 and node 5. Again node 5 bootstrapped from node 3 where data was misssing.
So now key K is lost and thats how we list very few rows.
Moreover, in step 1 situation could be worse. we can also have a scenario where some writes just happened on one of three replicas and cassandra chooses replicas where this data is missing for streaming ranges to 2 new nodes.
Am I making sense?
We are using C* 2.0.3.
ThanksAnuj

Sent from Yahoo Mail on Android |

--
http://twitter.com/tjake

Re: Adding Nodes With Inconsistent Data

Posted by Jake Luciani <ja...@gmail.com>.

This is no longer an issue in 2.1.
https://issues.apache.org/jira/browse/CASSANDRA-2434

We now make sure the replica we bootstrap from is the one that will no
longer own that range

On Wed, Jun 24, 2015 at 4:58 PM, Alain RODRIGUEZ <ar...@gmail.com> wrote:

> It looks to me that can indeed happen theoretically (I might be wrong).
>
> However,
>
> - Hinted Handoff tends to remove this issue, if this is big worry, you
> might want to make sure HH are enabled and well tuned
> - Read Repairs (synchronous or not) might have mitigate things also, if
> you read fresh data. You can set this to higher values.
> - After an outage, you should always run a nodetool repair on the node
> that went done - following the best practices, or because you understand
> the reasons - or just trust HH if it is enough to you.
>
> So I would say that you can always "shoot yourself in your foot", whatever
> you do, yet following best practices or understanding the internals is the
> key imho.
>
> I would say it is a good question though.
>
> Alain.
>
>
>
> 2015-06-24 19:43 GMT+02:00 Anuj Wadehra <an...@yahoo.co.in>:
>
>> Hi,
>>
>> We faced a scenario where we lost little data after adding 2 nodes in the
>> cluster. There were intermittent dropped mutations in the cluster. Need to
>> verify my understanding how this may have happened to do Root Cause
>> Analysis:
>>
>> Scenario: 3 nodes, RF=3, Read / Write CL= Quorum
>>
>> 1. Due to overloaded cluster, some writes just happened on 2 nodes: node
>> 1 & node 2 whike asynchronous mutations dropped on node 3.
>> So say key K with Token T was not written to 3.
>>
>> 2. I added node 4 and suppose as per newly calculated ranges, now token T
>> is supposed to have replicas on node 1, node 3, and node 4. Unfortunately
>> node 4 started bootstrapping from node 3 where key K was missing.
>>
>> 3. After 2 min gap recommended, I added node 5 and as per new token
>> distribution suppose token T now is suppossed to have replicas on node 3,
>> node 4 and node 5. Again node 5 bootstrapped from node 3 where data was
>> misssing.
>>
>> So now key K is lost and thats how we list very few rows.
>>
>> Moreover, in step 1 situation could be worse. we can also have a scenario
>> where some writes just happened on one of three replicas and cassandra
>> chooses  replicas where this data is missing for streaming ranges to 2 new
>> nodes.
>>
>> Am I making sense?
>>
>> We are using C* 2.0.3.
>>
>> Thanks
>> Anuj
>>
>>
>>
>> Sent from Yahoo Mail on Android
>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>
>
>


-- 
http://twitter.com/tjake

Re: Adding Nodes With Inconsistent Data

Posted by Alain RODRIGUEZ <ar...@gmail.com>.

It looks to me that can indeed happen theoretically (I might be wrong).

However,

- Hinted Handoff tends to remove this issue, if this is big worry, you
might want to make sure HH are enabled and well tuned
- Read Repairs (synchronous or not) might have mitigate things also, if you
read fresh data. You can set this to higher values.
- After an outage, you should always run a nodetool repair on the node that
went done - following the best practices, or because you understand the
reasons - or just trust HH if it is enough to you.

So I would say that you can always "shoot yourself in your foot", whatever
you do, yet following best practices or understanding the internals is the
key imho.

I would say it is a good question though.

Alain.



2015-06-24 19:43 GMT+02:00 Anuj Wadehra <an...@yahoo.co.in>:

> Hi,
>
> We faced a scenario where we lost little data after adding 2 nodes in the
> cluster. There were intermittent dropped mutations in the cluster. Need to
> verify my understanding how this may have happened to do Root Cause
> Analysis:
>
> Scenario: 3 nodes, RF=3, Read / Write CL= Quorum
>
> 1. Due to overloaded cluster, some writes just happened on 2 nodes: node 1
> & node 2 whike asynchronous mutations dropped on node 3.
> So say key K with Token T was not written to 3.
>
> 2. I added node 4 and suppose as per newly calculated ranges, now token T
> is supposed to have replicas on node 1, node 3, and node 4. Unfortunately
> node 4 started bootstrapping from node 3 where key K was missing.
>
> 3. After 2 min gap recommended, I added node 5 and as per new token
> distribution suppose token T now is suppossed to have replicas on node 3,
> node 4 and node 5. Again node 5 bootstrapped from node 3 where data was
> misssing.
>
> So now key K is lost and thats how we list very few rows.
>
> Moreover, in step 1 situation could be worse. we can also have a scenario
> where some writes just happened on one of three replicas and cassandra
> chooses  replicas where this data is missing for streaming ranges to 2 new
> nodes.
>
> Am I making sense?
>
> We are using C* 2.0.3.
>
> Thanks
> Anuj
>
>
>
> Sent from Yahoo Mail on Android
> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>