You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Oleg Anastasjev <ol...@gmail.com> on 2010/07/16 14:52:58 UTC

Node failback scenario: How can clients be sure hinted handoff is completed ?

Hello all,

I currently working on testing of various HA scenarios on small cassandra
cluster of 8 nodes, RF=3. I have a test environment with thrift clients doing
double writes of all operations to cassandra cluster and reliable storage and
cross checking read results. Reads are performed with CL=ONE due to latency
requirements. I tested how fail over and fail back is working. 

I found, that on failback, a lot of data mismatches between reliable storage and
cassandra was discovered as soon as failed back node started to accept reads.
Later, as soon as hinted handoff was completed, no more mismatches was ever
reported.

So, the idea is, that even with CL=ONE we could have almost no inconsistencies
on node failback, if cassandra node started to accept reads after hinted handoff
is completed.

So, the question is: is it possible for thrift client to know the current status
of hinted handoff of just failed back node ? 
This way clients could wait for HH to complete to not query just failed back
node and reroute queries to other endpoints, while failing back node
synchronizes itself with cluster.



Re: Node failback scenario: How can clients be sure hinted handoff is completed ?

Posted by Jonathan Ellis <jb...@gmail.com>.
If you can't accept out of date data you shouldn't be reading at
CL.ONE.  Making HH more complex is not the answer.

On Fri, Jul 16, 2010 at 7:52 AM, Oleg Anastasjev <ol...@gmail.com> wrote:
> Hello all,
>
> I currently working on testing of various HA scenarios on small cassandra
> cluster of 8 nodes, RF=3. I have a test environment with thrift clients doing
> double writes of all operations to cassandra cluster and reliable storage and
> cross checking read results. Reads are performed with CL=ONE due to latency
> requirements. I tested how fail over and fail back is working.
>
> I found, that on failback, a lot of data mismatches between reliable storage and
> cassandra was discovered as soon as failed back node started to accept reads.
> Later, as soon as hinted handoff was completed, no more mismatches was ever
> reported.
>
> So, the idea is, that even with CL=ONE we could have almost no inconsistencies
> on node failback, if cassandra node started to accept reads after hinted handoff
> is completed.
>
> So, the question is: is it possible for thrift client to know the current status
> of hinted handoff of just failed back node ?
> This way clients could wait for HH to complete to not query just failed back
> node and reroute queries to other endpoints, while failing back node
> synchronizes itself with cluster.
>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com