You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Stefano Ortolani <os...@gmail.com> on 2015/02/11 06:13:34 UTC

Recommissioned a node

Hi,

I recommissioned a node after decommissioningit.
That happened (1) after a successfull decommission (checked), (2) without
wiping the data directory on the node, (3) simply by restarting the
cassandra service. The node now reports himself healty and up and running

Knowing that I issued the "repair" command and patiently waited for its
completion, can I assume the cluster, and its internals (replicas, balance
between those) to be healthy and "as new"?

Regards,
Stefano

Re: Recommissioned a node

Posted by Stefano Ortolani <os...@gmail.com>.

Definitely, I think the very same re this issue.

On Thu, Feb 12, 2015 at 7:04 AM, Eric Stevens <mi...@gmail.com> wrote:

> I definitely find it surprising that a node which was decommissioned is
> willing to rejoin a cluster.  I can't think of any legitimate scenario
> where you'd want that, and I'm surprised the node doesn't track that it was
> decommissioned and refuse to rejoin without at least a -D flag to force it.
>
> Way too easy for a node to get restarted with, for example, a naive
> service status checker, or a scheduled reboot after a kernel upgrade, or so
> forth.    You may also have been decommissioning the node because of
> hardware issues, in which case you may also be threatening the stability or
> performance characteristics of the cluster, and at absolute best you have
> short term consistency issues, and near-100% overstreaming to get the node
> decommissioned again.
>
> IMO, especially with the threat to unrecoverable consistency violations,
> this should be a critical bug.
>
> On Wed, Feb 11, 2015 at 12:39 PM, Jonathan Haddad <jo...@jonhaddad.com>
> wrote:
>
>> And after decreasing your RF (rare but happens)
>>
>> On Wed Feb 11 2015 at 11:31:38 AM Robert Coli <rc...@eventbrite.com>
>> wrote:
>>
>>> On Wed, Feb 11, 2015 at 11:20 AM, Jonathan Haddad <jo...@jonhaddad.com>
>>> wrote:
>>>
>>>> It could, because the tombstones that mark data deleted may have been
>>>> removed.  There would be nothing that says "this data is gone".
>>>>
>>>> If you're worried about it, turn up your gc grace seconds.  Also, don't
>>>> revive nodes back into a cluster with old data sitting on them.
>>>>
>>>
>>> Also, run cleanup after range movements :
>>>
>>> https://issues.apache.org/jira/browse/CASSANDRA-7764
>>>
>>> =Rob
>>>
>>>
>>
>

Re: Recommissioned a node

Posted by Eric Stevens <mi...@gmail.com>.

I created an issue for this:
https://issues.apache.org/jira/browse/CASSANDRA-8801

On Thu, Feb 12, 2015 at 10:18 AM, Robert Coli <rc...@eventbrite.com> wrote:

> On Thu, Feb 12, 2015 at 7:04 AM, Eric Stevens <mi...@gmail.com> wrote:
>
>> IMO, especially with the threat to unrecoverable consistency violations,
>> this should be a critical bug.
>>
>
> You should file a JIRA, and let the list know what it is? :D
>
> I was never sure if it was just me being unreasonably literal to presume
> that decommission made the node forget its prior state, if I'm honest? It
> is nice to hear from other operators that this matches their expectations.
> But yes, the current behavior seems to have risks that "forgetting"
> doesn't, and I don't understand what benefits (if any) it has.
>
> As a brief aside, this is Yet Another Reason why you probably don't ever
> want a Cassandra node to automatically start on boot, or restart. If you
> don't know its configuration, it could join a cluster, which might be
> Meaningfully Bad in some circumstances.
>
> =Rob
>

Re: Recommissioned a node

Posted by Robert Coli <rc...@eventbrite.com>.

On Thu, Feb 12, 2015 at 7:04 AM, Eric Stevens <mi...@gmail.com> wrote:

> IMO, especially with the threat to unrecoverable consistency violations,
> this should be a critical bug.
>

You should file a JIRA, and let the list know what it is? :D

I was never sure if it was just me being unreasonably literal to presume
that decommission made the node forget its prior state, if I'm honest? It
is nice to hear from other operators that this matches their expectations.
But yes, the current behavior seems to have risks that "forgetting"
doesn't, and I don't understand what benefits (if any) it has.

As a brief aside, this is Yet Another Reason why you probably don't ever
want a Cassandra node to automatically start on boot, or restart. If you
don't know its configuration, it could join a cluster, which might be
Meaningfully Bad in some circumstances.

=Rob

Re: Recommissioned a node

Posted by Eric Stevens <mi...@gmail.com>.

I definitely find it surprising that a node which was decommissioned is
willing to rejoin a cluster.  I can't think of any legitimate scenario
where you'd want that, and I'm surprised the node doesn't track that it was
decommissioned and refuse to rejoin without at least a -D flag to force it.

Way too easy for a node to get restarted with, for example, a naive service
status checker, or a scheduled reboot after a kernel upgrade, or so forth.
   You may also have been decommissioning the node because of hardware
issues, in which case you may also be threatening the stability or
performance characteristics of the cluster, and at absolute best you have
short term consistency issues, and near-100% overstreaming to get the node
decommissioned again.

IMO, especially with the threat to unrecoverable consistency violations,
this should be a critical bug.

On Wed, Feb 11, 2015 at 12:39 PM, Jonathan Haddad <jo...@jonhaddad.com> wrote:

> And after decreasing your RF (rare but happens)
>
> On Wed Feb 11 2015 at 11:31:38 AM Robert Coli <rc...@eventbrite.com>
> wrote:
>
>> On Wed, Feb 11, 2015 at 11:20 AM, Jonathan Haddad <jo...@jonhaddad.com>
>> wrote:
>>
>>> It could, because the tombstones that mark data deleted may have been
>>> removed.  There would be nothing that says "this data is gone".
>>>
>>> If you're worried about it, turn up your gc grace seconds.  Also, don't
>>> revive nodes back into a cluster with old data sitting on them.
>>>
>>
>> Also, run cleanup after range movements :
>>
>> https://issues.apache.org/jira/browse/CASSANDRA-7764
>>
>> =Rob
>>
>>
>

Re: Recommissioned a node

Posted by Jonathan Haddad <jo...@jonhaddad.com>.

And after decreasing your RF (rare but happens)

On Wed Feb 11 2015 at 11:31:38 AM Robert Coli <rc...@eventbrite.com> wrote:

> On Wed, Feb 11, 2015 at 11:20 AM, Jonathan Haddad <jo...@jonhaddad.com>
> wrote:
>
>> It could, because the tombstones that mark data deleted may have been
>> removed.  There would be nothing that says "this data is gone".
>>
>> If you're worried about it, turn up your gc grace seconds.  Also, don't
>> revive nodes back into a cluster with old data sitting on them.
>>
>
> Also, run cleanup after range movements :
>
> https://issues.apache.org/jira/browse/CASSANDRA-7764
>
> =Rob
>
>

Re: Recommissioned a node

Posted by Robert Coli <rc...@eventbrite.com>.

On Wed, Feb 11, 2015 at 11:20 AM, Jonathan Haddad <jo...@jonhaddad.com> wrote:

> It could, because the tombstones that mark data deleted may have been
> removed.  There would be nothing that says "this data is gone".
>
> If you're worried about it, turn up your gc grace seconds.  Also, don't
> revive nodes back into a cluster with old data sitting on them.
>

Also, run cleanup after range movements :

https://issues.apache.org/jira/browse/CASSANDRA-7764

=Rob

Re: Recommissioned a node

Posted by Jonathan Haddad <jo...@jonhaddad.com>.

It could, because the tombstones that mark data deleted may have been
removed.  There would be nothing that says "this data is gone".

If you're worried about it, turn up your gc grace seconds.  Also, don't
revive nodes back into a cluster with old data sitting on them.

On Wed Feb 11 2015 at 11:18:19 AM Stefano Ortolani <os...@gmail.com>
wrote:

> Hi Robert,
>
> it all happened within 30 minutes, so way before the default
> gc_grace_second (864000), so I should be fine.
> However, this is quite shocking if you ask me. The only possibility of
> getting to an inconsistent state only by restarting a node is appalling...
>
> Can other people confirm that a restart after the gc_grace_seconds passed
> would have violated consistency permanently?
>
> Cheers,
> Stefano
>
> On Wed, Feb 11, 2015 at 10:56 AM, Robert Coli <rc...@eventbrite.com>
> wrote:
>
>> On Tue, Feb 10, 2015 at 9:13 PM, Stefano Ortolani <os...@gmail.com>
>> wrote:
>>
>>> I recommissioned a node after decommissioningit.
>>> That happened (1) after a successfull decommission (checked), (2)
>>> without wiping the data directory on the node, (3) simply by restarting the
>>> cassandra service. The node now reports himself healty and up and running
>>>
>>> Knowing that I issued the "repair" command and patiently waited for its
>>> completion, can I assume the cluster, and its internals (replicas, balance
>>> between those) to be healthy and "as new"?
>>>
>>
>> Did you recommission before or after gc_grace_seconds passed? If after,
>> you have violated consistency in a manner that, in my understanding, one
>> cannot recover from.
>>
>> If before, you're pretty fine.
>>
>> However this is a longstanding issue that I personally consider a bug :
>>
>> Your decommissioned node doesn't forget its state. In my opinion, you
>> told it to leave the cluster, it should forget everything it knew as a
>> member of that cluster.
>>
>> If you file this behavior as a JIRA bug, please let the list know.
>>
>> =Rob
>>
>>
>

Re: Recommissioned a node

Posted by Stefano Ortolani <os...@gmail.com>.

Hi Robert,

it all happened within 30 minutes, so way before the default
gc_grace_second (864000), so I should be fine.
However, this is quite shocking if you ask me. The only possibility of
getting to an inconsistent state only by restarting a node is appalling...

Can other people confirm that a restart after the gc_grace_seconds passed
would have violated consistency permanently?

Cheers,
Stefano

On Wed, Feb 11, 2015 at 10:56 AM, Robert Coli <rc...@eventbrite.com> wrote:

> On Tue, Feb 10, 2015 at 9:13 PM, Stefano Ortolani <os...@gmail.com>
> wrote:
>
>> I recommissioned a node after decommissioningit.
>> That happened (1) after a successfull decommission (checked), (2) without
>> wiping the data directory on the node, (3) simply by restarting the
>> cassandra service. The node now reports himself healty and up and running
>>
>> Knowing that I issued the "repair" command and patiently waited for its
>> completion, can I assume the cluster, and its internals (replicas, balance
>> between those) to be healthy and "as new"?
>>
>
> Did you recommission before or after gc_grace_seconds passed? If after,
> you have violated consistency in a manner that, in my understanding, one
> cannot recover from.
>
> If before, you're pretty fine.
>
> However this is a longstanding issue that I personally consider a bug :
>
> Your decommissioned node doesn't forget its state. In my opinion, you told
> it to leave the cluster, it should forget everything it knew as a member of
> that cluster.
>
> If you file this behavior as a JIRA bug, please let the list know.
>
> =Rob
>
>

Re: Recommissioned a node

Posted by Robert Coli <rc...@eventbrite.com>.

On Tue, Feb 10, 2015 at 9:13 PM, Stefano Ortolani <os...@gmail.com>
wrote:

> I recommissioned a node after decommissioningit.
> That happened (1) after a successfull decommission (checked), (2) without
> wiping the data directory on the node, (3) simply by restarting the
> cassandra service. The node now reports himself healty and up and running
>
> Knowing that I issued the "repair" command and patiently waited for its
> completion, can I assume the cluster, and its internals (replicas, balance
> between those) to be healthy and "as new"?
>

Did you recommission before or after gc_grace_seconds passed? If after, you
have violated consistency in a manner that, in my understanding, one cannot
recover from.

If before, you're pretty fine.

However this is a longstanding issue that I personally consider a bug :

Your decommissioned node doesn't forget its state. In my opinion, you told
it to leave the cluster, it should forget everything it knew as a member of
that cluster.

If you file this behavior as a JIRA bug, please let the list know.

=Rob

Re: Recommissioned a node

Posted by Eric Stevens <mi...@gmail.com>.

Yes, including the system and commitlog directory.  Then when it starts,
it's like a brand new node and will bootstrap to join.

On Wed, Feb 11, 2015 at 8:56 AM, Stefano Ortolani <os...@gmail.com>
wrote:

> Hi Eric,
>
> thanks for your answer. The reason why it got recommissioned was simply
> because the machine got restarted (with auto_bootstrap set to to true). A
> cleaner, and correct, recommission would have just required wiping the data
> folder, am I correct? Or would I have needed to change something else in
> the node configuration?
>
> Cheers,
> Stefano
>
> On Wed, Feb 11, 2015 at 6:47 AM, Eric Stevens <mi...@gmail.com> wrote:
>
>> AFAIK it should be ok after the repair completed (it was missing all
>> writes while it was decommissioning and while it was offline, and nobody
>> would have been keeping hinted handoffs for it, so repair was the right
>> thing to do).  Unless RF=N you're now due for a cleanup on the other nodes.
>>
>> Generally speaking though this was probably not a good idea.  When the
>> node came back online, it rejoined the cluster immediately and would have
>> been serving client requests without having a consistent view of the data.
>> A safer approach would be to wipe the data directory and bootstrap it as a
>> clean new member.
>>
>> I'm curious what prompted that cycle of decommission then recommission.
>>
>> On Tue, Feb 10, 2015 at 10:13 PM, Stefano Ortolani <os...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I recommissioned a node after decommissioningit.
>>> That happened (1) after a successfull decommission (checked), (2)
>>> without wiping the data directory on the node, (3) simply by restarting the
>>> cassandra service. The node now reports himself healty and up and running
>>>
>>> Knowing that I issued the "repair" command and patiently waited for its
>>> completion, can I assume the cluster, and its internals (replicas, balance
>>> between those) to be healthy and "as new"?
>>>
>>> Regards,
>>> Stefano
>>>
>>
>>
>

Re: Recommissioned a node

Posted by Stefano Ortolani <os...@gmail.com>.

Hi Eric,

thanks for your answer. The reason why it got recommissioned was simply
because the machine got restarted (with auto_bootstrap set to to true). A
cleaner, and correct, recommission would have just required wiping the data
folder, am I correct? Or would I have needed to change something else in
the node configuration?

Cheers,
Stefano

On Wed, Feb 11, 2015 at 6:47 AM, Eric Stevens <mi...@gmail.com> wrote:

> AFAIK it should be ok after the repair completed (it was missing all
> writes while it was decommissioning and while it was offline, and nobody
> would have been keeping hinted handoffs for it, so repair was the right
> thing to do).  Unless RF=N you're now due for a cleanup on the other nodes.
>
> Generally speaking though this was probably not a good idea.  When the
> node came back online, it rejoined the cluster immediately and would have
> been serving client requests without having a consistent view of the data.
> A safer approach would be to wipe the data directory and bootstrap it as a
> clean new member.
>
> I'm curious what prompted that cycle of decommission then recommission.
>
> On Tue, Feb 10, 2015 at 10:13 PM, Stefano Ortolani <os...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I recommissioned a node after decommissioningit.
>> That happened (1) after a successfull decommission (checked), (2) without
>> wiping the data directory on the node, (3) simply by restarting the
>> cassandra service. The node now reports himself healty and up and running
>>
>> Knowing that I issued the "repair" command and patiently waited for its
>> completion, can I assume the cluster, and its internals (replicas, balance
>> between those) to be healthy and "as new"?
>>
>> Regards,
>> Stefano
>>
>
>

Re: Recommissioned a node

Posted by Eric Stevens <mi...@gmail.com>.

AFAIK it should be ok after the repair completed (it was missing all writes
while it was decommissioning and while it was offline, and nobody would
have been keeping hinted handoffs for it, so repair was the right thing to
do).  Unless RF=N you're now due for a cleanup on the other nodes.

Generally speaking though this was probably not a good idea.  When the node
came back online, it rejoined the cluster immediately and would have been
serving client requests without having a consistent view of the data.  A
safer approach would be to wipe the data directory and bootstrap it as a
clean new member.

I'm curious what prompted that cycle of decommission then recommission.

On Tue, Feb 10, 2015 at 10:13 PM, Stefano Ortolani <os...@gmail.com>
wrote:

> Hi,
>
> I recommissioned a node after decommissioningit.
> That happened (1) after a successfull decommission (checked), (2) without
> wiping the data directory on the node, (3) simply by restarting the
> cassandra service. The node now reports himself healty and up and running
>
> Knowing that I issued the "repair" command and patiently waited for its
> completion, can I assume the cluster, and its internals (replicas, balance
> between those) to be healthy and "as new"?
>
> Regards,
> Stefano
>