You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cassandra.apache.org by Josh McKenzie <jm...@apache.org> on 2022/09/20 17:08:21 UTC

[Discuss] CASSANDRA-17896, Gossip, and foot guns

Ticket for reference: https://issues.apache.org/jira/browse/CASSANDRA-17896

Context: "We should expose a system env (-D) param to advanced operators to have the ability to specify the replace_addresses_token to be used during host replacement in cases where Gossip gets into a bad state."

My question for the dev list: *should* we expose this parameter and functionality even if it's heavily documented as being highly unsafe and a big foot gun? Clusters can get into states where you effectively can't bootstrap a replacement without nuking it and starting over and manually intervening / twiddling with peers tables, which this allows us to work around a bit more gracefully as operators, but if you do this the wrong way it opens up a world of hurt.

Given CEP-21 is on the horizon (https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-21%3A+Transactional+Cluster+Metadata) I'm leaning towards closing this out as Won't Fix but leaving the branch linked in the event someone runs into this and wants to hotfix it into a local build or something; I'm assuming CEP-21 will land before the next major which would make this redundant.

What does everyone else think?

~Josh

Re: [Discuss] CASSANDRA-17896, Gossip, and foot guns

Posted by Patrick McFadin <pm...@gmail.com>.
IIRC that is something you can already change in JMX? If that's the case, I
say leave that as the barrier to entry into the "parameters of doom."

CEP-21 is the right path forward. It addresses the root cause instead of
creating more ways to fix how you got there. This is the best thing for end
users.

Patrick

On Tue, Sep 20, 2022 at 10:09 AM Josh McKenzie <jm...@apache.org> wrote:

> Ticket for reference:
> https://issues.apache.org/jira/browse/CASSANDRA-17896
>
> Context: "We should expose a system env (-D) param to advanced operators
> to have the ability to specify the replace_addresses_token to be used
> during host replacement in cases where Gossip gets into a bad state."
>
> My question for the dev list: *should* we expose this parameter and
> functionality even if it's heavily documented as being highly unsafe and a
> big foot gun? Clusters can get into states where you effectively can't
> bootstrap a replacement without nuking it and starting over and manually
> intervening / twiddling with peers tables, which this allows us to work
> around a bit more gracefully as operators, but if you do this the wrong way
> it opens up a world of hurt.
>
> Given CEP-21 is on the horizon (
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-21%3A+Transactional+Cluster+Metadata
> ) I'm leaning towards closing this out as Won't Fix but leaving the
> branch linked in the event someone runs into this and wants to hotfix it
> into a local build or something; I'm assuming CEP-21 will land before the
> next major which would make this redundant.
>
> What does everyone else think?
>
> ~Josh
>