You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jeremy Hanna (JIRA)" <ji...@apache.org> on 2018/05/11 13:41:00 UTC
[jira] [Updated] (CASSANDRA-13327) Pending endpoints size check for
CAS doesn't play nicely with writes-on-replacement
[ https://issues.apache.org/jira/browse/CASSANDRA-13327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jeremy Hanna updated CASSANDRA-13327:
-------------------------------------
Labels: LWT (was: )
> Pending endpoints size check for CAS doesn't play nicely with writes-on-replacement
> -----------------------------------------------------------------------------------
>
> Key: CASSANDRA-13327
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13327
> Project: Cassandra
> Issue Type: Bug
> Components: Coordination
> Reporter: Ariel Weisberg
> Assignee: Ariel Weisberg
> Priority: Major
> Labels: LWT
>
> Consider this ring:
> 127.0.0.1 MR UP JOINING -7301836195843364181
> 127.0.0.2 MR UP NORMAL -7263405479023135948
> 127.0.0.3 MR UP NORMAL -7205759403792793599
> 127.0.0.4 MR DOWN NORMAL -7148113328562451251
> where 127.0.0.1 was bootstrapping for cluster expansion. Note that, due to the failure of 127.0.0.4, 127.0.0.1 was stuck trying to stream from it and making no progress.
> Then the down node was replaced so we had:
> 127.0.0.1 MR UP JOINING -7301836195843364181
> 127.0.0.2 MR UP NORMAL -7263405479023135948
> 127.0.0.3 MR UP NORMAL -7205759403792793599
> 127.0.0.5 MR UP JOINING -7148113328562451251
> It’s confusing in the ring - the first JOINING is a genuine bootstrap, the second is a replacement. We now had CAS unavailables (but no non-CAS unvailables). I think it’s because the pending endpoints check thinks that 127.0.0.5 is gaining a range when it’s just replacing.
> The workaround is to kill the stuck JOINING node, but Cassandra shouldn’t unnecessarily fail these requests.
> It also appears like required participants is bumped by 1 during a host replacement so if the replacing host fails you will get unavailables and timeouts.
> This is related to the check added in CASSANDRA-8346
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org