You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Fd Habash <fm...@gmail.com> on 2019/05/07 15:05:12 UTC

Is There a Way To Proactively Monitor Reads Returning No Data Due to Consistency Level?

Typically, when a read is submitted to C*, it may complete  with  …
1. No errors & returns expected data
2. Errors out with UnavailableException
3. No error & returns zero rows on first attempt, but returned on subsequent runs.

The third scenario happens as a result of cluster entropy specially during unexpected outages affecting on-premise or cloud infrastructures.

Typical scenario …
a) Multiple nodes fail in the cluster
b) Node replaced via bootstrapping
c) Row is in Cassandra, but client hits nodes that do not have the data yet. Gets zero rows. Row is retrieved on third or forth attempts and read repairs takes care of it.
d) Eventually, repair is run and issue is fixed.

Digging in Cassandra metrics, I’ve found ‘cassandra.unavailables.count’. Looks like this metrics captures scenario ' UnavailableException’, however.

I have also read the Yelp article describing a metric they called ‘underreplicated keyspaces’. These are keyspace ranges that will fail to satisfy reads/write at a certain CL due to insufficient endpoints. If my understanding is correct, this is also measuring scenario 2. 

Tying to find a metric to capture scenario 3 above. Is this possible at all?



----------------
Thank you


Re: Is There a Way To Proactively Monitor Reads Returning No Data Due to Consistency Level?

Posted by Jeff Jirsa <jj...@gmail.com>.
Short answer is no, because missing consistency isn’t an error and there’s no way to know you’ve missed data without reading at ALL, and if it were ok to read at ALL you’d already be doing it (it’s not ok for most apps).

> On May 7, 2019, at 8:05 AM, Fd Habash <fm...@gmail.com> wrote:
> 
> Typically, when a read is submitted to C*, it may complete  with  …
> No errors & returns expected data
> Errors out with UnavailableException

Can also error with timeout 

> No error & returns zero rows on first attempt, but returned on subsequent runs.

Can also return stale or incomplete data, not just no data

>  
> The third scenario happens as a result of cluster entropy specially during unexpected outages affecting on-premise or cloud infrastructures.
>  
> Typical scenario …
> Multiple nodes fail in the cluster
> Node replaced via bootstrapping

You must run repair among the remaining replicas before replacement if you want to maintain consistency guarantees

> Row is in Cassandra, but client hits nodes that do not have the data yet. Gets zero rows. Row is retrieved on third or forth attempts and read repairs takes care of it.
> Eventually, repair is run and issue is fixed.
>  
> Digging in Cassandra metrics, I’ve found ‘cassandra.unavailables.count’. Looks like this metrics captures scenario ' UnavailableException’, however.
>  
> I have also read the Yelp article describing a metric they called ‘underreplicated keyspaces’. These are keyspace ranges that will fail to satisfy reads/write at a certain CL due to insufficient endpoints. If my understanding is correct, this is also measuring scenario 2.
>  
> Tying to find a metric to capture scenario 3 above. Is this possible at all?
>  
>  
>  
> ----------------
> Thank you
>