You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Sergey Olefir <so...@gmail.com> on 2013/02/07 16:20:57 UTC

Range Queries consistency in an inconsistent cluster.

Hi,

I'm somewhat lost in regards to the results I can expect from running range
queries in a (temporarily) 'inconsistent' cluster (e.g. if node has been
down for some time and hasn't caught up yet).

Suppose I have 4 nodes in 2 DCs (cassandra 1.1.7):
DCa: a1 and a2
DCb: b1 and b2
I'm using ByteOrdered partitioner and nodes are balanced (tokens are set
properly to split data evenly in each DC, tokens in DCb are [DCa + 1]).

I'm running with replication DCa:2, DCb:2 (each node contains full data).
I'm using counters only and I'm putting heavy load (say 10k increments per
second). The writes are directed to a1 and a2 only, b1 and b2 are for backup
and possibly for running queries against (haven't decided yet). I monitor
cluster via nodetool and see that data load is even on all nodes (as is
expected).

Now a2 goes down. I can immediately see that a1 data load grows very-very
rapidly (because of hints for a2). After half an hour a2 comes back up. I
know from experience that it'll take hours before all hints from a1 will be
sent to a2.

What is going to happen with range queries directed to a1 & a2 while a2
catches up?

As far as I understand, there's no read-repair when doing range queries, so
there's no usual assurance of "wrong once, correct next time around".

- Does consistency level setting apply to range queries?
- If I direct query to a1 (which is up-to-date), will it go to a2 for the
slice that 'belongs' to a2? (even though a1 has full replica of data)
- If I direct query to a2 (which is NOT up-to-date), is it smart enough to
go to a1 for data?
- In general, considering I have a cluster with 3 nodes up-to-date and one
that is not -- is there a way to run a query that'll return up-to-date data
(i.e. will not use data from a2)?

Also, what if a2 has been down for longer than hints window (1 hour by
default)? Is Cassandra smart enough to avoid using a2 for range queries
while it is inconsistent?

Thanks in advance,
Sergey

--
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Range-Queries-consistency-in-an-inconsistent-cluster-tp7585400.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: Range Queries consistency in an inconsistent cluster.

Posted by Sergey Olefir <so...@gmail.com>.

So you're basically saying that read consistency levels do affect range
queries? With high enough level I should be able to get the 'correct' data
regardless of some node(s) being behind-the-times?

My first read through https://issues.apache.org/jira/browse/CASSANDRA-967
left me with impression that range queries are not that safe ("This isn't
quite right – it will miss repairing any rows that don't exist on the first
node at all."), but on the review it seems that this comment might apply
only to the specific case of consistency level ONE.


Edward Capriolo wrote
> Range queries do not currently read repair, although there is a ticket
> on this. If you want them to be consistent do them at QUORUM, or all.
> But in a strange quirk since get_range_slice does not repair those
> operations are not "eventually consistent"
> 
> On Thu, Feb 7, 2013 at 10:20 AM, Sergey Olefir &lt;

> solf.lists@

> &gt; wrote:
>> Hi,
>>
>> I'm somewhat lost in regards to the results I can expect from running
>> range
>> queries in a (temporarily) 'inconsistent' cluster (e.g. if node has been
>> down for some time and hasn't caught up yet).
>>
>> Suppose I have 4 nodes in 2 DCs (cassandra 1.1.7):
>> DCa: a1 and a2
>> DCb: b1 and b2
>> I'm using ByteOrdered partitioner and nodes are balanced (tokens are set
>> properly to split data evenly in each DC, tokens in DCb are [DCa + 1]).
>>
>> I'm running with replication DCa:2, DCb:2 (each node contains full data).
>> I'm using counters only and I'm putting heavy load (say 10k increments
>> per
>> second). The writes are directed to a1 and a2 only, b1 and b2 are for
>> backup
>> and possibly for running queries against (haven't decided yet). I monitor
>> cluster via nodetool and see that data load is even on all nodes (as is
>> expected).
>>
>> Now a2 goes down. I can immediately see that a1 data load grows very-very
>> rapidly (because of hints for a2). After half an hour a2 comes back up. I
>> know from experience that it'll take hours before all hints from a1 will
>> be
>> sent to a2.
>>
>> What is going to happen with range queries directed to a1 & a2 while a2
>> catches up?
>>
>> As far as I understand, there's no read-repair when doing range queries,
>> so
>> there's no usual assurance of "wrong once, correct next time around".
>>
>> - Does consistency level setting apply to range queries?
>> - If I direct query to a1 (which is up-to-date), will it go to a2 for the
>> slice that 'belongs' to a2? (even though a1 has full replica of data)
>> - If I direct query to a2 (which is NOT up-to-date), is it smart enough
>> to
>> go to a1 for data?
>> - In general, considering I have a cluster with 3 nodes up-to-date and
>> one
>> that is not -- is there a way to run a query that'll return up-to-date
>> data
>> (i.e. will not use data from a2)?
>>
>>
>> Also, what if a2 has been down for longer than hints window (1 hour by
>> default)? Is Cassandra smart enough to avoid using a2 for range queries
>> while it is inconsistent?
>>
>> Thanks in advance,
>> Sergey
>>
>>
>>
>> --
>> View this message in context:
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Range-Queries-consistency-in-an-inconsistent-cluster-tp7585400.html
>> Sent from the 

> cassandra-user@.apache

>  mailing list archive at Nabble.com.





--
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Range-Queries-consistency-in-an-inconsistent-cluster-tp7585400p7585419.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: Range Queries consistency in an inconsistent cluster.

Posted by Edward Capriolo <ed...@gmail.com>.

Range queries do not currently read repair, although there is a ticket
on this. If you want them to be consistent do them at QUORUM, or all.
But in a strange quirk since get_range_slice does not repair those
operations are not "eventually consistent"

On Thu, Feb 7, 2013 at 10:20 AM, Sergey Olefir <so...@gmail.com> wrote:
> Hi,
>
> I'm somewhat lost in regards to the results I can expect from running range
> queries in a (temporarily) 'inconsistent' cluster (e.g. if node has been
> down for some time and hasn't caught up yet).
>
> Suppose I have 4 nodes in 2 DCs (cassandra 1.1.7):
> DCa: a1 and a2
> DCb: b1 and b2
> I'm using ByteOrdered partitioner and nodes are balanced (tokens are set
> properly to split data evenly in each DC, tokens in DCb are [DCa + 1]).
>
> I'm running with replication DCa:2, DCb:2 (each node contains full data).
> I'm using counters only and I'm putting heavy load (say 10k increments per
> second). The writes are directed to a1 and a2 only, b1 and b2 are for backup
> and possibly for running queries against (haven't decided yet). I monitor
> cluster via nodetool and see that data load is even on all nodes (as is
> expected).
>
> Now a2 goes down. I can immediately see that a1 data load grows very-very
> rapidly (because of hints for a2). After half an hour a2 comes back up. I
> know from experience that it'll take hours before all hints from a1 will be
> sent to a2.
>
> What is going to happen with range queries directed to a1 & a2 while a2
> catches up?
>
> As far as I understand, there's no read-repair when doing range queries, so
> there's no usual assurance of "wrong once, correct next time around".
>
> - Does consistency level setting apply to range queries?
> - If I direct query to a1 (which is up-to-date), will it go to a2 for the
> slice that 'belongs' to a2? (even though a1 has full replica of data)
> - If I direct query to a2 (which is NOT up-to-date), is it smart enough to
> go to a1 for data?
> - In general, considering I have a cluster with 3 nodes up-to-date and one
> that is not -- is there a way to run a query that'll return up-to-date data
> (i.e. will not use data from a2)?
>
>
> Also, what if a2 has been down for longer than hints window (1 hour by
> default)? Is Cassandra smart enough to avoid using a2 for range queries
> while it is inconsistent?
>
> Thanks in advance,
> Sergey
>
>
>
> --
> View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Range-Queries-consistency-in-an-inconsistent-cluster-tp7585400.html
> Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.