You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Tobias Eriksson <to...@qvantel.com> on 2020/08/11 05:10:13 UTC

Why a READ REPAIR ?

Hi
We have a Cassandra solution with 2 DCs where each DC has  >30 nodes
From time to time we see problems with READ REPAIR, but I am stuck with the analysis
We have a pattern for these faults where we do

  1.  INSERT with Local Quorum (2 out of 3)
  2.  Wait for 0.5 - 1 seconds time window
  3.  READ with Local Quorum (2 out of 3)
     *   Triggers a read repair
  4.  Then we do an UPDATE …

The replication factor is 3
In my world in (1) we for sure store the data in 2 out of 3 places, and I would be surprised if we would not also reach the 3;rd node within 0.5 sec
So how come in (3) the read can’t get a proper response from 2 out of 3
Some are saying the problem started occurring when we added DC2, but I can’t understand how it could be as our query is Local Quorum and will involve only DC1

How can I debug this fault ?
How can I track if the data has reached all 3 nodes ?

All ideas are welcome
-Tobias



Re: Why a READ REPAIR ?

Posted by manish khandelwal <ma...@gmail.com>.
Hi Tobias

READ2 will not be blocked by READ repair of READ1.

Regards
Manish

On Tue, Aug 11, 2020 at 6:02 PM Tobias Eriksson <to...@qvantel.com>
wrote:

> Thanx Erick,
>
> Perhaps this is super obvious but I need a confirmation as you say “…not
> subsequent reads for other data unrelated to the read being repaired…”
>
> But this is subsequent reads to the _*same*_ partition key
>
> So to be more explicit
>
> READ 1 with Local Quorum : SELECT * FROM products WHERE id = ABC123
>
> READ 2 with Local One : SELECT * FROM products WHERE id = ABC123
>
>
>
> Would read (2) be blocked by the READ REPAIR that was done by read (1)
>
> As I understand that the read repair is working not on the whole table but
> on the partition key it had problems with
>
>
>
> -Tobias
>
>
>
>
>
> *From: *Erick Ramirez <er...@datastax.com>
> *Reply to: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
> *Date: *Tuesday, 11 August 2020 at 11:26
> *To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
> *Subject: *Re: Why a READ REPAIR ?
>
>
>
> If a READ triggers a READ REPAIR, and then if we do an additional READ
> would then that BLOCK until the “first” READ REPAIR would be done ?
>
> -Tobias
>
>
>
> Not all read repairs are blocking RRs (aka foreground RRs). There are also
> background RRs which by definition are non-blocking because they happen in
> the background.
>
>
>
> In response to your question, the "additional read" is not blocked. In a
> blocking RR, if there is a mismatch in the data returned to the coordinator
> from the replicas involved in the query (determined by the read consistency
> level) then the coordinator sends a repair to the out-of-sync replica in
> the foreground before sending the result back to the client so the read is
> blocked until the RR is completed. To reiterate, only the read involved in
> the RR is blocked -- not subsequent reads for other data unrelated to the
> read being repaired. Cheers!
>
>
>

Re: Why a READ REPAIR ?

Posted by Erick Ramirez <er...@datastax.com>.
You can check for the string "digest mismatch" in the logs. Similarly, you
can track the RR stats in nodetool netstats and the dropped mutations
in nodetool
tpstats.

To be clear though, RRs are a side-effect of nodes either dropping
mutations or being unresponsive so they miss mutations. RRs do *not* cause
nodes to be unresponsive or unavailable. RRs are Cassandra's way of
automatically dealing with the "outages" -- it doesn't cause them. Cheers!

>

Re: Why a READ REPAIR ?

Posted by Tobias Eriksson <to...@qvantel.com>.
Thanx Erick
Is there a way to turn on tracing based on certain criteria,
I would like to start tracing when there is some sort of failure, i.e. in this case when a READ REPAIR is triggered as I would like to know why we sometimes can’t reach one of the nodes
-Tobias

From: Erick Ramirez <er...@datastax.com>
Reply to: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Date: Wednesday, 12 August 2020 at 01:30
To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Subject: Re: Why a READ REPAIR ?

Perhaps this is super obvious but I need a confirmation as you say “…not subsequent reads for other data unrelated to the read being repaired…”
But this is subsequent reads to the _same_ partition key
So to be more explicit
READ 1 with Local Quorum : SELECT * FROM products WHERE id = ABC123
READ 2 with Local One : SELECT * FROM products WHERE id = ABC123

Would read (2) be blocked by the READ REPAIR that was done by read (1)
As I understand that the read repair is working not on the whole table but on the partition key it had problems with

Not at all. We (the community) are continuing to invest more on improving the docs because these things are not obvious or well-understood.

The read request is blocked while the repair of replicas is in progress. To be clear, it is the request (singular) that is blocked and not some locking mechanism at the partition level (say like LWTs). Other requests (read 2 in your case) will continue to work. Cheers!

Re: Why a READ REPAIR ?

Posted by Erick Ramirez <er...@datastax.com>.
>
> Perhaps this is super obvious but I need a confirmation as you say “…not
> subsequent reads for other data unrelated to the read being repaired…”
>
> But this is subsequent reads to the _*same*_ partition key
>
> So to be more explicit
>
> READ 1 with Local Quorum : SELECT * FROM products WHERE id = ABC123
>
> READ 2 with Local One : SELECT * FROM products WHERE id = ABC123
>
>
>
> Would read (2) be blocked by the READ REPAIR that was done by read (1)
>
> As I understand that the read repair is working not on the whole table but
> on the partition key it had problems with
>

Not at all. We (the community) are continuing to invest more on improving
the docs because these things are not obvious or well-understood.

The read *request* is blocked while the repair of replicas is in
progress. To be clear, it is the *request* (singular) that is blocked and
*not* some locking mechanism at the partition level (say like LWTs). Other
requests (read 2 in your case) will continue to work. Cheers!

Re: Why a READ REPAIR ?

Posted by Tobias Eriksson <to...@qvantel.com>.
Thanx Erick,
Perhaps this is super obvious but I need a confirmation as you say “…not subsequent reads for other data unrelated to the read being repaired…”
But this is subsequent reads to the _same_ partition key
So to be more explicit
READ 1 with Local Quorum : SELECT * FROM products WHERE id = ABC123
READ 2 with Local One : SELECT * FROM products WHERE id = ABC123

Would read (2) be blocked by the READ REPAIR that was done by read (1)
As I understand that the read repair is working not on the whole table but on the partition key it had problems with

-Tobias


From: Erick Ramirez <er...@datastax.com>
Reply to: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Date: Tuesday, 11 August 2020 at 11:26
To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Subject: Re: Why a READ REPAIR ?

If a READ triggers a READ REPAIR, and then if we do an additional READ would then that BLOCK until the “first” READ REPAIR would be done ?
-Tobias

Not all read repairs are blocking RRs (aka foreground RRs). There are also background RRs which by definition are non-blocking because they happen in the background.

In response to your question, the "additional read" is not blocked. In a blocking RR, if there is a mismatch in the data returned to the coordinator from the replicas involved in the query (determined by the read consistency level) then the coordinator sends a repair to the out-of-sync replica in the foreground before sending the result back to the client so the read is blocked until the RR is completed. To reiterate, only the read involved in the RR is blocked -- not subsequent reads for other data unrelated to the read being repaired. Cheers!


Re: Why a READ REPAIR ?

Posted by Erick Ramirez <er...@datastax.com>.
>
> If a READ triggers a READ REPAIR, and then if we do an additional READ
> would then that BLOCK until the “first” READ REPAIR would be done ?
>
> -Tobias
>

Not all read repairs are blocking RRs (aka foreground RRs). There are also
background RRs which by definition are non-blocking because they happen in
the background.

In response to your question, the "additional read" is not blocked. In a
blocking RR, if there is a mismatch in the data returned to the coordinator
from the replicas involved in the query (determined by the read consistency
level) then the coordinator sends a repair to the out-of-sync replica in
the foreground before sending the result back to the client so the read is
blocked until the RR is completed. To reiterate, only the read involved in
the RR is blocked -- not subsequent reads for other data unrelated to the
read being repaired. Cheers!

Re: Why a READ REPAIR ?

Posted by Tobias Eriksson <to...@qvantel.com>.
If a READ triggers a READ REPAIR, and then if we do an additional READ would then that BLOCK until the “first” READ REPAIR would be done ?
-Tobias

From: Jeff Jirsa <jj...@gmail.com>
Reply to: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Date: Tuesday, 11 August 2020 at 07:30
To: cassandra <us...@cassandra.apache.org>
Subject: Re: Why a READ REPAIR ?


Your schema may have read repair (non-blocking, background) set to 10% (0.1, for dclocal).
You may have GC pauses causing writes (or reads) to be delayed.
You may be hitting a cassandra bug.

Would need the `TRACING` output to know for sure.


On Mon, Aug 10, 2020 at 10:10 PM Tobias Eriksson <to...@qvantel.com>> wrote:
Hi
We have a Cassandra solution with 2 DCs where each DC has  >30 nodes
From time to time we see problems with READ REPAIR, but I am stuck with the analysis
We have a pattern for these faults where we do

  1.  INSERT with Local Quorum (2 out of 3)
  2.  Wait for 0.5 - 1 seconds time window
  3.  READ with Local Quorum (2 out of 3)

     *   Triggers a read repair

  1.  Then we do an UPDATE …

The replication factor is 3
In my world in (1) we for sure store the data in 2 out of 3 places, and I would be surprised if we would not also reach the 3;rd node within 0.5 sec
So how come in (3) the read can’t get a proper response from 2 out of 3
Some are saying the problem started occurring when we added DC2, but I can’t understand how it could be as our query is Local Quorum and will involve only DC1

How can I debug this fault ?
How can I track if the data has reached all 3 nodes ?

All ideas are welcome
-Tobias



Re: Why a READ REPAIR ?

Posted by Jeff Jirsa <jj...@gmail.com>.
Your schema may have read repair (non-blocking, background) set to 10%
(0.1, for dclocal).
You may have GC pauses causing writes (or reads) to be delayed.
You may be hitting a cassandra bug.

Would need the `TRACING` output to know for sure.


On Mon, Aug 10, 2020 at 10:10 PM Tobias Eriksson <
tobias.eriksson@qvantel.com> wrote:

> Hi
>
> We have a Cassandra solution with 2 DCs where each DC has  >30 nodes
>
> From time to time we see problems with READ REPAIR, but I am stuck with
> the analysis
>
> We have a pattern for these faults where we do
>
>    1. INSERT with Local Quorum (2 out of 3)
>    2. Wait for 0.5 - 1 seconds time window
>    3. READ with Local Quorum (2 out of 3)
>       1. Triggers a read repair
>    4. Then we do an UPDATE …
>
>
>
> The replication factor is 3
>
> In my world in (1) we for sure store the data in 2 out of 3 places, and I
> would be surprised if we would not also reach the 3;rd node within 0.5 sec
>
> So how come in (3) the read can’t get a proper response from 2 out of 3
>
> Some are saying the problem started occurring when we added DC2, but I
> can’t understand how it could be as our query is Local Quorum and will
> involve only DC1
>
>
>
> How can I debug this fault ?
>
> How can I track if the data has reached all 3 nodes ?
>
>
>
> All ideas are welcome
>
> -Tobias
>
>
>
>
>