You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Joe Alex <jo...@gmail.com> on 2010/11/01 19:22:29 UTC

Question on Hinted Handoff

I am running cassandra 0.6.6
4 nodes with RF=3
Have set the InitialTokens manually
Loaded around 4 million records

Had a question why the following is happening

Node 4 was down when a new key 1005 was added (value 123).
Node 2 which is responsible for the key added a Hint for Node 4
Node 4 was brought back up and noticed the Hints Handed off and data
started showing up in Node 4
Noticed a ReadRepair also happenning
All fine so far

did a get and the value is 123
Node 2 returned the data, with background digest checks on Node 3 and
Node 4 (RF=3)

Now Node 2 (responsible for key 1005) was taken down
Key 1005 value was updated to A123 (ApplyRowMutation on Node 3 and Node 4).
Node 4 added a hint for Node 2

did a get and the value is A123
Node 3 returned the data, with background digest checks on Node 4
(RF=3 and Node 2 is down)

Now Node 2 is back up
Hints were handed off by Node 4

did a get and the value is the old value 123
Node 2 returned the data, with background digest checks on Node 3 and
Node 4 (RF=3)

Was expecting the latest write wins - A123 written on Node 4 to be in Node 2.
Any ideas ?

Now if Node 2 is down the old value A123 will be returned
Tried a repair when Node 2 was up and all Nodes got updated to the old data

Re: Question on Hinted Handoff

Posted by Brandon Williams <dr...@gmail.com>.
On Tue, Nov 2, 2010 at 8:32 AM, Joe Alex <jo...@gmail.com> wrote:

> Thanks, that clarifies why HH did not work, So have to use .7. Is
> there .6.7 ? I am using .6.6 now.
>

You can use the 0.6 branch instead, it's generally more stable than trunk.


> I did see the log entry in Node 4, adding a Hint for Node 2 and when
> Node 2 came up noticed a log entry in Node 4 that 1 row Handed Off -
> so I thought it was working.
>

This is why it was a pernicious little bug that got me.   Logs indicated it
worked, it just...didn't.

About RR - a read in Node 1 or Node 2 indicated a RR (see log entry
> above) was happening. But dont think that ever happened. Those entries
> keep repeating every time a read in Node 1 and 2 and kept giving old
> data.
>

When you read at CL.ONE, RR is asynchronous -- it happens after your data is
returned.  CL.QUORUM would return the right data after performing the RR.


> AES assuming Anti-Entropy - Tried nodetool repair, this also did not
> fix the issue.
>

That is odd.  Are you sure repair had finished before the test?  In 0.6,
nodetool doesn't block until repair completes (but does in 0.7)


> The only time I saw it being fixed was a read to Node 3 or Node 4
> (mostly Node 3 which acted as responsible when Node 2 was down)  and
> then 2nd reads to Node 2 started showing the latest data.
>

Right, node 3/4 was authoritative for CL.ONE since it had a copy of the
data.  After you read it, RR worked and fixed node 2.


> Do you think RR and AES should have definitely worked ?


I suspect RR worked, and you weren't waiting on AES long enough (it
essentially performs a major compaction, and then streams whatever needs to
be sent to the others, which can take a while depending on the volume of
data.)

-Brandon

Re: Question on Hinted Handoff

Posted by Joe Alex <jo...@gmail.com>.
Thanks, that clarifies why HH did not work, So have to use .7. Is
there .6.7 ? I am using .6.6 now.
I did see the log entry in Node 4, adding a Hint for Node 2 and when
Node 2 came up noticed a log entry in Node 4 that 1 row Handed Off -
so I thought it was working.

About RR - a read in Node 1 or Node 2 indicated a RR (see log entry
above) was happening. But dont think that ever happened. Those entries
keep repeating every time a read in Node 1 and 2 and kept giving old
data.

AES assuming Anti-Entropy - Tried nodetool repair, this also did not
fix the issue.

The only time I saw it being fixed was a read to Node 3 or Node 4
(mostly Node 3 which acted as responsible when Node 2 was down)  and
then 2nd reads to Node 2 started showing the latest data.

Do you think RR and AES should have definitely worked ?


On Tue, Nov 2, 2010 at 1:01 AM, Brandon Williams <dr...@gmail.com> wrote:
> On Mon, Nov 1, 2010 at 8:28 PM, Joe Alex <jo...@gmail.com> wrote:
>>
>> My expectation was even though Node 2 was down key written to Node 3
>> or 4 should be updated in Node 2 using Hint and the subsequent reads
>> to Node 1 or Node 2 itself should have got the latest value
>
> Your expectation is correct, unfortunately I broke HH in 0.6.5 with a bad
> backport and it affects
> 0.6.6: https://issues.apache.org/jira/browse/CASSANDRA-1656
> But, that is why we have Read Repair and AES.  It's worth noting that HH is
> best effort anyway, if Node 2 was down but the failure detector hadn't
> noticed yet, no hint would be created and you'd have the same behavior.
> -Brandon

Re: Question on Hinted Handoff

Posted by Brandon Williams <dr...@gmail.com>.
On Mon, Nov 1, 2010 at 8:28 PM, Joe Alex <jo...@gmail.com> wrote:
>
> My expectation was even though Node 2 was down key written to Node 3
> or 4 should be updated in Node 2 using Hint and the subsequent reads
> to Node 1 or Node 2 itself should have got the latest value


Your expectation is correct, unfortunately I broke HH in 0.6.5 with a bad
backport and it affects 0.6.6:
https://issues.apache.org/jira/browse/CASSANDRA-1656
But, that is why we have Read Repair and AES.  It's worth noting that HH is
best effort anyway, if Node 2 was down but the failure detector hadn't
noticed yet, no hint would be created and you'd have the same behavior.

-Brandon

Re: Question on Hinted Handoff

Posted by Joe Alex <jo...@gmail.com>.
My findings - would be nice if somebody can please verify.
Critical for our eval to verify HintedHandOff, ReadRepair and
AntiEntropy works as we think it does

Node 1, 2, 3, 4
RF=3

All nodes up - Node 2 is responsible for key 1005
Write CL=ONE, Insert key 1005, value=123 in Node 1
Node 2, 3, 4 gets data
Read CL=ONE  Read on all 4 nodes gets value 123

Node 2 is down now
Write CL=ONE, Insert key 1005, value=A123 in Node 1
Node 3, 4 gets data
Node 4 adds Hint for Node 2
Read CL=ONE  Read on all 3 Node 1, 3, 4 gets value A123.

Node 2 is brought back up
Hint for 1 row is Handed off by Node 4
Read CL=ONE  Read on Node 1, 2, gets old value 123.
See log entries:
Received responses in DataRepairHandler : ID : 72
FROM:/Node 3
TYPE:RESPONSE_STAGE
VERB:READ_RESPONSE
Received responses in DataRepairHandler : ID : 72
FROM:/Node 4
TYPE:RESPONSE_STAGE
VERB:READ_RESPONSE

Read CL=ONE  Read on Node 3, 4, gets new value A123.
Most of the times after a Read on Node 3 or 4 Node 1 and 2 start
showing the latest updated A123 (updated when Node 2 was down)

My expectation was even though Node 2 was down key written to Node 3
or 4 should be updated in Node 2 using Hint and the subsequent reads
to Node 1 or Node 2 itself should have got the latest value



On Mon, Nov 1, 2010 at 4:06 PM, Joe Alex <jo...@gmail.com> wrote:
> To keep the question simple,
> If an insert or remove Key happens when the responsible Node is down
> (RF=3) what is the expected behavior when the Node comes back up ?
>
> For example Key 1005 was removed when Node 2 was down. When Node 2
> came back up it started showing back ?
>
> On Mon, Nov 1, 2010 at 2:22 PM, Joe Alex <jo...@gmail.com> wrote:
>> I am running cassandra 0.6.6
>> 4 nodes with RF=3
>> Have set the InitialTokens manually
>> Loaded around 4 million records
>>
>> Had a question why the following is happening
>>
>> Node 4 was down when a new key 1005 was added (value 123).
>> Node 2 which is responsible for the key added a Hint for Node 4
>> Node 4 was brought back up and noticed the Hints Handed off and data
>> started showing up in Node 4
>> Noticed a ReadRepair also happenning
>> All fine so far
>>
>> did a get and the value is 123
>> Node 2 returned the data, with background digest checks on Node 3 and
>> Node 4 (RF=3)
>>
>> Now Node 2 (responsible for key 1005) was taken down
>> Key 1005 value was updated to A123 (ApplyRowMutation on Node 3 and Node 4).
>> Node 4 added a hint for Node 2
>>
>> did a get and the value is A123
>> Node 3 returned the data, with background digest checks on Node 4
>> (RF=3 and Node 2 is down)
>>
>> Now Node 2 is back up
>> Hints were handed off by Node 4
>>
>> did a get and the value is the old value 123
>> Node 2 returned the data, with background digest checks on Node 3 and
>> Node 4 (RF=3)
>>
>> Was expecting the latest write wins - A123 written on Node 4 to be in Node 2.
>> Any ideas ?
>>
>> Now if Node 2 is down the old value A123 will be returned
>> Tried a repair when Node 2 was up and all Nodes got updated to the old data
>>
>

Re: Question on Hinted Handoff

Posted by Joe Alex <jo...@gmail.com>.
To keep the question simple,
If an insert or remove Key happens when the responsible Node is down
(RF=3) what is the expected behavior when the Node comes back up ?

For example Key 1005 was removed when Node 2 was down. When Node 2
came back up it started showing back ?

On Mon, Nov 1, 2010 at 2:22 PM, Joe Alex <jo...@gmail.com> wrote:
> I am running cassandra 0.6.6
> 4 nodes with RF=3
> Have set the InitialTokens manually
> Loaded around 4 million records
>
> Had a question why the following is happening
>
> Node 4 was down when a new key 1005 was added (value 123).
> Node 2 which is responsible for the key added a Hint for Node 4
> Node 4 was brought back up and noticed the Hints Handed off and data
> started showing up in Node 4
> Noticed a ReadRepair also happenning
> All fine so far
>
> did a get and the value is 123
> Node 2 returned the data, with background digest checks on Node 3 and
> Node 4 (RF=3)
>
> Now Node 2 (responsible for key 1005) was taken down
> Key 1005 value was updated to A123 (ApplyRowMutation on Node 3 and Node 4).
> Node 4 added a hint for Node 2
>
> did a get and the value is A123
> Node 3 returned the data, with background digest checks on Node 4
> (RF=3 and Node 2 is down)
>
> Now Node 2 is back up
> Hints were handed off by Node 4
>
> did a get and the value is the old value 123
> Node 2 returned the data, with background digest checks on Node 3 and
> Node 4 (RF=3)
>
> Was expecting the latest write wins - A123 written on Node 4 to be in Node 2.
> Any ideas ?
>
> Now if Node 2 is down the old value A123 will be returned
> Tried a repair when Node 2 was up and all Nodes got updated to the old data
>