You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Fd Habash <fm...@gmail.com> on 2017/12/05 21:38:51 UTC

When Replacing a Node, How to Force a Consistent Bootstrap

Assume I have cluster of 3 nodes (A,B,C). Row x was written with CL=LQ to node A and B. Before it was written to C, node B crashes. I replaced B and it bootstrapped data from node C.

Now, row x is missing from C and B.  If node A crashes, it will be replaced and it will bootstrap from either C or B. As such, row x is now completely gone from the entire ring. 

Is this scenario possible at all (at least in C* < 3.0). 

How can a newly replaced node be forced to bootstrap from the node in the replica set that has the most recent data? 

Otherwise, we have to repair a node immediately after bootstrapping it for a node replacement.

Thank you

RE: When Replacing a Node, How to Force a Consistent Bootstrap

Posted by Fd Habash <fm...@gmail.com>.

“ … but it's better to repair before and after if possible …”

After, I simply run ‘nodetool repair –full’ on the replaced node. But before bootstrapping, if my cluster is distributed over 3 AZ’s, what do I repair? The entire other AZ’s? As one pointed out earlier, I can use ‘nodetool repair -hosts”, how do you identify what specific hosts to repair?

Thanks 

----------------
Thank you

From: Fd Habash
Sent: Thursday, December 7, 2017 12:09 PM
To: user@cassandra.apache.org
Subject: RE: When Replacing a Node, How to Force a Consistent Bootstrap

Thank you.

How do I identify what other 2 nodes the former downed node replicated with? A replica set of 3 nodes A,B,C. Now, C has been terminated by AWS and is gone. Using the getendpoints assumes knowing a partition key value, but how do you even know what key to use?

If there is a way to identify A and B, I, then, can simply run ‘nodetool repair’ to repair ALL the ranges on either.

Thanks 

----------------
Thank you

From: kurt greaves
Sent: Wednesday, December 6, 2017 6:45 PM
To: User
Subject: Re: When Replacing a Node, How to Force a Consistent Bootstrap

That's also an option but it's better to repair before and after if possible, if you don't repair beforehand you could end up missing some replicas until you repair after replacement, which could cause queries to return old/no data. Alternatively you could use ALL after replacing until the repair completes.

For example, A and C have replica a, A dies, on replace A streams the partition owning a from B, and thus is still inconsistent. QUORUM query hits A and B, and no results are returned for a.

On 5 December 2017 at 23:04, Fred Habash <fm...@gmail.com> wrote:
Or, do a full repair after bootstrapping completes?



On Dec 5, 2017 4:43 PM, "Jeff Jirsa" <jj...@gmail.com> wrote:
You cant ask cassandra to stream from the node with the "most recent data", because for some rows B may be most recent, and for others C may be most recent - you'd have to stream from both (which we don't support).

You'll need to repair (and you can repair before you do the replace to avoid the window of time where you violate consistency - use the -hosts option to allow repair with a down host, you'll repair A+C, so when B starts it'll definitely have all of the data).


On Tue, Dec 5, 2017 at 1:38 PM, Fd Habash <fm...@gmail.com> wrote:
Assume I have cluster of 3 nodes (A,B,C). Row x was written with CL=LQ to node A and B. Before it was written to C, node B crashes. I replaced B and it bootstrapped data from node C.
 
Now, row x is missing from C and B.  If node A crashes, it will be replaced and it will bootstrap from either C or B. As such, row x is now completely gone from the entire ring. 
 
Is this scenario possible at all (at least in C* < 3.0). 
 
How can a newly replaced node be forced to bootstrap from the node in the replica set that has the most recent data? 
 
Otherwise, we have to repair a node immediately after bootstrapping it for a node replacement.
 
Thank you

RE: When Replacing a Node, How to Force a Consistent Bootstrap

Posted by Fd Habash <fm...@gmail.com>.

Thank you.

How do I identify what other 2 nodes the former downed node replicated with? A replica set of 3 nodes A,B,C. Now, C has been terminated by AWS and is gone. Using the getendpoints assumes knowing a partition key value, but how do you even know what key to use?

If there is a way to identify A and B, I, then, can simply run ‘nodetool repair’ to repair ALL the ranges on either.

Thanks 

----------------
Thank you

From: kurt greaves
Sent: Wednesday, December 6, 2017 6:45 PM
To: User
Subject: Re: When Replacing a Node, How to Force a Consistent Bootstrap

That's also an option but it's better to repair before and after if possible, if you don't repair beforehand you could end up missing some replicas until you repair after replacement, which could cause queries to return old/no data. Alternatively you could use ALL after replacing until the repair completes.

For example, A and C have replica a, A dies, on replace A streams the partition owning a from B, and thus is still inconsistent. QUORUM query hits A and B, and no results are returned for a.

On 5 December 2017 at 23:04, Fred Habash <fm...@gmail.com> wrote:
Or, do a full repair after bootstrapping completes?



On Dec 5, 2017 4:43 PM, "Jeff Jirsa" <jj...@gmail.com> wrote:
You cant ask cassandra to stream from the node with the "most recent data", because for some rows B may be most recent, and for others C may be most recent - you'd have to stream from both (which we don't support).

You'll need to repair (and you can repair before you do the replace to avoid the window of time where you violate consistency - use the -hosts option to allow repair with a down host, you'll repair A+C, so when B starts it'll definitely have all of the data).


On Tue, Dec 5, 2017 at 1:38 PM, Fd Habash <fm...@gmail.com> wrote:
Assume I have cluster of 3 nodes (A,B,C). Row x was written with CL=LQ to node A and B. Before it was written to C, node B crashes. I replaced B and it bootstrapped data from node C.
 
Now, row x is missing from C and B.  If node A crashes, it will be replaced and it will bootstrap from either C or B. As such, row x is now completely gone from the entire ring. 
 
Is this scenario possible at all (at least in C* < 3.0). 
 
How can a newly replaced node be forced to bootstrap from the node in the replica set that has the most recent data? 
 
Otherwise, we have to repair a node immediately after bootstrapping it for a node replacement.
 
Thank you

Re: When Replacing a Node, How to Force a Consistent Bootstrap

Posted by kurt greaves <ku...@instaclustr.com>.

That's also an option but it's better to repair before and after if
possible, if you don't repair beforehand you could end up missing some
replicas until you repair after replacement, which could cause queries to
return old/no data. Alternatively you could use ALL after replacing until
the repair completes.

For example, A and C have replica *a, *A dies, on replace A streams the
partition owning *a *from B, and thus is still inconsistent. QUORUM query
hits A and B, and no results are returned for *a.*

On 5 December 2017 at 23:04, Fred Habash <fm...@gmail.com> wrote:

> Or, do a full repair after bootstrapping completes?
>
>
>
> On Dec 5, 2017 4:43 PM, "Jeff Jirsa" <jj...@gmail.com> wrote:
>
>> You cant ask cassandra to stream from the node with the "most recent
>> data", because for some rows B may be most recent, and for others C may be
>> most recent - you'd have to stream from both (which we don't support).
>>
>> You'll need to repair (and you can repair before you do the replace to
>> avoid the window of time where you violate consistency - use the -hosts
>> option to allow repair with a down host, you'll repair A+C, so when B
>> starts it'll definitely have all of the data).
>>
>>
>> On Tue, Dec 5, 2017 at 1:38 PM, Fd Habash <fm...@gmail.com> wrote:
>>
>>> Assume I have cluster of 3 nodes (A,B,C). Row x was written with CL=LQ
>>> to node A and B. Before it was written to C, node B crashes. I replaced B
>>> and it bootstrapped data from node C.
>>>
>>>
>>>
>>> Now, row x is missing from C and B.  If node A crashes, it will be
>>> replaced and it will bootstrap from either C or B. As such, row x is now
>>> completely gone from the entire ring.
>>>
>>>
>>>
>>> Is this scenario possible at all (at least in C* < 3.0).
>>>
>>>
>>>
>>> How can a newly replaced node be forced to bootstrap from the node in
>>> the replica set that has the most recent data?
>>>
>>>
>>>
>>> Otherwise, we have to repair a node immediately after bootstrapping it
>>> for a node replacement.
>>>
>>>
>>>
>>> Thank you
>>>
>>>
>>>
>>
>>

Re: When Replacing a Node, How to Force a Consistent Bootstrap

Posted by Fred Habash <fm...@gmail.com>.

Or, do a full repair after bootstrapping completes?



On Dec 5, 2017 4:43 PM, "Jeff Jirsa" <jj...@gmail.com> wrote:

> You cant ask cassandra to stream from the node with the "most recent
> data", because for some rows B may be most recent, and for others C may be
> most recent - you'd have to stream from both (which we don't support).
>
> You'll need to repair (and you can repair before you do the replace to
> avoid the window of time where you violate consistency - use the -hosts
> option to allow repair with a down host, you'll repair A+C, so when B
> starts it'll definitely have all of the data).
>
>
> On Tue, Dec 5, 2017 at 1:38 PM, Fd Habash <fm...@gmail.com> wrote:
>
>> Assume I have cluster of 3 nodes (A,B,C). Row x was written with CL=LQ to
>> node A and B. Before it was written to C, node B crashes. I replaced B and
>> it bootstrapped data from node C.
>>
>>
>>
>> Now, row x is missing from C and B.  If node A crashes, it will be
>> replaced and it will bootstrap from either C or B. As such, row x is now
>> completely gone from the entire ring.
>>
>>
>>
>> Is this scenario possible at all (at least in C* < 3.0).
>>
>>
>>
>> How can a newly replaced node be forced to bootstrap from the node in the
>> replica set that has the most recent data?
>>
>>
>>
>> Otherwise, we have to repair a node immediately after bootstrapping it
>> for a node replacement.
>>
>>
>>
>> Thank you
>>
>>
>>
>
>

Re: When Replacing a Node, How to Force a Consistent Bootstrap

Posted by Jeff Jirsa <jj...@gmail.com>.

You cant ask cassandra to stream from the node with the "most recent data",
because for some rows B may be most recent, and for others C may be most
recent - you'd have to stream from both (which we don't support).

You'll need to repair (and you can repair before you do the replace to
avoid the window of time where you violate consistency - use the -hosts
option to allow repair with a down host, you'll repair A+C, so when B
starts it'll definitely have all of the data).

On Tue, Dec 5, 2017 at 1:38 PM, Fd Habash <fm...@gmail.com> wrote:

> Assume I have cluster of 3 nodes (A,B,C). Row x was written with CL=LQ to
> node A and B. Before it was written to C, node B crashes. I replaced B and
> it bootstrapped data from node C.
>
>
>
> Now, row x is missing from C and B.  If node A crashes, it will be
> replaced and it will bootstrap from either C or B. As such, row x is now
> completely gone from the entire ring.
>
>
>
> Is this scenario possible at all (at least in C* < 3.0).
>
>
>
> How can a newly replaced node be forced to bootstrap from the node in the
> replica set that has the most recent data?
>
>
>
> Otherwise, we have to repair a node immediately after bootstrapping it for
> a node replacement.
>
>
>
> Thank you
>
>
>