You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Oleksandr Shulgin <ol...@zalando.de> on 2019/08/13 16:14:18 UTC

Re: To Repair or Not to Repair

On Thu, Mar 14, 2019 at 9:55 PM Jonathan Haddad <jo...@jonhaddad.com> wrote:

> My coworker Alex (from The Last Pickle) wrote an in depth blog post on
> TWCS.  We recommend not running repair on tables that use TWCS.
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html
>

Hi,

I was wondering about this again, as I've noticed one of the nodes in our
cluster accumulating ten times the number of files compared to the average
across the rest of cluster.  The files are all coming from a table with
TWCS and repair (running with Reaper) is ongoing.  The sudden growth
started around 24 hours ago as the affected node was restarted due to
failing AWS EC2 System check.  Now I'm thinking again if we should be
running those repairs at all. ;-)

In the Summary of the blog post linked above, the following is written:

It is advised to disable read repair on TWCS tables, and use an agressive
tombstone purging strategy as digest mismatches during reads will still
trigger read repairs.

Was it meant to read "disable anti-entropy repair" instead?  I find it
confusing otherwise.

Regards,
--
Alex

Odd number of files on one node during repair (was: To Repair or Not to Repair)

Posted by Oleksandr Shulgin <ol...@zalando.de>.

On Tue, Aug 13, 2019 at 6:14 PM Oleksandr Shulgin <
oleksandr.shulgin@zalando.de> wrote:

>
> I was wondering about this again, as I've noticed one of the nodes in our
> cluster accumulating ten times the number of files compared to the average
> across the rest of cluster.  The files are all coming from a table with
> TWCS and repair (running with Reaper) is ongoing.  The sudden growth
> started around 24 hours ago as the affected node was restarted due to
> failing AWS EC2 System check.
>

And now as the next weekly repair has started, the same node shows the
problem again.  Number of files went up to 6,000 in the last 7 hours, as
compared to the average of ~1,500 on the rest of the nodes, which remains
more or less constant.

Any advice how to debug it?

Regards,
--
Alex