You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by graham sanderson <gr...@vast.com> on 2013/11/10 07:12:24 UTC

Question about consistency levels

I’m trying to be more succinct this time since no answers on my last attempt.

We are currently using 2.0.2 in test (no C* in production yet), and use (LOCAL_)QUORUM CL on read and writes which guarantees (if successful) that we read latest data.

That said, it is highly likely that (LOCAL_)ONE would return our data since it isn’t read for quite some time after write.

Given that we must do our best to return data, we want to see what options we have when a quorum read fails (say 2 of 3 replicas go down with 3 replicas - note we have also seen this issue with bugs related to CF deletion/re-creating during compaction or load causing data corruption in which case 1 bad node can screw things up)

One option is to fall back to (LOCAL_)ONE if we detect the right exception from (LOCAL_)QUORUM from the client side, but that obviously degrades your consistency.

That said we ONLY ever do idempotent writes, and NEVER delete. So once again I wonder if there is a (reasonable) use case for a CL whereby you will accept the first non empty response from any replica?

Re: Question about consistency levels

Posted by graham sanderson <gr...@vast.com>.

Thanks Richard,

Note the SSTable corruption probably only happens as a result of some testing patterns we’re doing. That said we still want to make sure we can handle if it does happen (since the corrupted nodes will NOT be known to be down and thus still receive traffic).

In our particular use case, we’re happy to have writes fail completely (we’re ok for short term stale data but consistent data for reads vs unknown).

We will certainly look at going from quorum to one on reads as this seems (as you say) better than having nothing - if we try really hard we can actually tell if we have the right data returned from C* since this is stored data referenced by external indexes (note your point about multi data center is well made, though right now data served out of our backup data center will not be coming from C* - since we don’t have a cluster there - and we have yet to decide if we’ll make a cross data center C* cluster or one in each splitting data further upstream)

That said, if anyone wants to chime in on the other point about a new consistency level that returns the first non empty/tombstone result, I’d be really interested.

Thanks,

Graham.

On Nov 10, 2013, at 2:13 PM, Richard Lowe <ri...@arkivum.com> wrote:

> We're using Cassandra 1.1 with Hector 1.1 library. We've found that reducing the CL when an exception occurs is useful as it's usually easier to deal with things not being consistent for a few seconds than the database read/write not succeeding at all.
> 
> We have multiple DCs and use NetworkTopologyStrategy to strictly govern where data is replicated. Because of this, LOCAL_QUORUM isn't good enough: if a write occurs in DC A then nodes in DC B won't necessarily have the data before a read in DC B wants the (maybe updated) data. 
> 
> We therefore use ALL, EACH_QUORUM, QUORUM, LOCAL_QUORUM, ONE, ANY for write CL, falling back to the next in the list if a CL can't be achieved. With high-consistency writes, we can get away with lower-consistency reads, so use LOCAL_QUORUM, ONE for read CL, again falling back to the next if an error occurs. We also find it useful to retry with several attempts at each CL to account for network faults, which can be common on a congested/slow/unreliable network (especially WAN) or busy nodes. We only fail the operation if we've exhausted all CL and retries. 
> 
> This approach works well for us: we issue a warning if a lower CL is used or retries were required and we've seen only a handful of these on our nodes and then only during very busy periods. It helps that our use case allows us to occasionally take a couple of extra seconds to perform the database op if we need to; yours may not, I don't know.
> 
> We haven't had any problems with errors from corrupt SSTables but maybe that's because we're using a different version of Cassandra, a different client and likely have different read/write/delete usage.
> 
> Hope that helps.
> 
> -----Original Message-----
> From: graham sanderson [mailto:graham@vast.com] 
> Sent: 10 November 2013 06:12
> To: user@cassandra.apache.org
> Subject: Question about consistency levels
> 
> I'm trying to be more succinct this time since no answers on my last attempt.
> 
> We are currently using 2.0.2 in test (no C* in production yet), and use (LOCAL_)QUORUM CL on read and writes which guarantees (if successful) that we read latest data.
> 
> That said, it is highly likely that (LOCAL_)ONE would return our data since it isn't read for quite some time after write.
> 
> Given that we must do our best to return data, we want to see what options we have when a quorum read fails (say 2 of 3 replicas go down with 3 replicas - note we have also seen this issue with bugs related to CF deletion/re-creating during compaction or load causing data corruption in which case 1 bad node can screw things up)
> 
> One option is to fall back to (LOCAL_)ONE if we detect the right exception from (LOCAL_)QUORUM from the client side, but that obviously degrades your consistency.
> 
> That said we ONLY ever do idempotent writes, and NEVER delete. So once again I wonder if there is a (reasonable) use case for a CL whereby you will accept the first non empty response from any replica?

RE: Question about consistency levels

Posted by Richard Lowe <ri...@arkivum.com>.

We're using Cassandra 1.1 with Hector 1.1 library. We've found that reducing the CL when an exception occurs is useful as it's usually easier to deal with things not being consistent for a few seconds than the database read/write not succeeding at all.

We have multiple DCs and use NetworkTopologyStrategy to strictly govern where data is replicated. Because of this, LOCAL_QUORUM isn't good enough: if a write occurs in DC A then nodes in DC B won't necessarily have the data before a read in DC B wants the (maybe updated) data. 

We therefore use ALL, EACH_QUORUM, QUORUM, LOCAL_QUORUM, ONE, ANY for write CL, falling back to the next in the list if a CL can't be achieved. With high-consistency writes, we can get away with lower-consistency reads, so use LOCAL_QUORUM, ONE for read CL, again falling back to the next if an error occurs. We also find it useful to retry with several attempts at each CL to account for network faults, which can be common on a congested/slow/unreliable network (especially WAN) or busy nodes. We only fail the operation if we've exhausted all CL and retries. 

This approach works well for us: we issue a warning if a lower CL is used or retries were required and we've seen only a handful of these on our nodes and then only during very busy periods. It helps that our use case allows us to occasionally take a couple of extra seconds to perform the database op if we need to; yours may not, I don't know.

We haven't had any problems with errors from corrupt SSTables but maybe that's because we're using a different version of Cassandra, a different client and likely have different read/write/delete usage.

Hope that helps.

-----Original Message-----
From: graham sanderson [mailto:graham@vast.com] 
Sent: 10 November 2013 06:12
To: user@cassandra.apache.org
Subject: Question about consistency levels

I'm trying to be more succinct this time since no answers on my last attempt.

We are currently using 2.0.2 in test (no C* in production yet), and use (LOCAL_)QUORUM CL on read and writes which guarantees (if successful) that we read latest data.

That said, it is highly likely that (LOCAL_)ONE would return our data since it isn't read for quite some time after write.

Given that we must do our best to return data, we want to see what options we have when a quorum read fails (say 2 of 3 replicas go down with 3 replicas - note we have also seen this issue with bugs related to CF deletion/re-creating during compaction or load causing data corruption in which case 1 bad node can screw things up)

One option is to fall back to (LOCAL_)ONE if we detect the right exception from (LOCAL_)QUORUM from the client side, but that obviously degrades your consistency.

That said we ONLY ever do idempotent writes, and NEVER delete. So once again I wonder if there is a (reasonable) use case for a CL whereby you will accept the first non empty response from any replica?