You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by "Peddi, Praveen" <pe...@amazon.com> on 2016/03/30 16:14:23 UTC

auto_boorstrap when a node is down

Hello all,
We just upgraded to 2.2.4 (from 2.0.9) and we noticed one issue when new nodes are added. When we add a new node when no nodes are down in the cluster, everything works fine but when we add new node while 1 node is down, I am seeing following error. My understanding was when auto_bootstrap is enabled, bootstrapping process uses QUORUM consistency so it should work when one node is down. Is that not correct? Is there a way to add a new node with bootstrapping, but not using replace address option? We use auto scaling and new node gets added automatically when one node goes down and since its all scripted I can't use replace address in cassandra-env.sh file as a one-time option.

One fallback mechanism we could use is to disable auto bootstrap and let read repairs populate the data over time but its not ideal. Is this even a good alternative to this failure?

ERROR 20:30:45 Exception encountered during startup
java.lang.RuntimeException: A node required to move the data consistently is down (/xx.xx.xx.xx). If you wish to move the data from a potentially inconsistent replica, restart the node with -Dcassandra.consistent.rangemovement=false

Praveen

Re: auto_boorstrap when a node is down

Posted by Carlos Alonso <in...@mrcalonso.com>.

Mmm ok, then I think you may need follow the standard dead node replacement
procedure:
https://docs.datastax.com/en/cassandra/2.2/cassandra/operations/opsReplaceNode.html

Cheers!

Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso>

On 31 March 2016 at 16:34, Peddi, Praveen <pe...@amazon.com> wrote:

> Hi Carlos,
> In our case, old node is dead and is not accessible. So I am not sure if
> we can use rsync in this case.
>
> Praveen
>
>
> From: Carlos Alonso <in...@mrcalonso.com>
> Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
> Date: Thursday, March 31, 2016 at 10:31 AM
> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
> Subject: Re: auto_boorstrap when a node is down
>
> If that's your use case I've developed a quick disk based replacement
> procedure.
>
> Basically all it involves is rsyncing the data from the old node to the
> new node and bring the new one as if it was the old one (only the IP will
> change). Step by step details here:
> http://mrcalonso.com/cassandra-instantaneous-in-place-node-replacement/
>
> We've been using it for a while and works nicely and avoids the time,
> resources and baby-sitting consumption of streaming data across nodes.
>
> Regards
>
> Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso>
>
> On 31 March 2016 at 15:26, Peddi, Praveen <pe...@amazon.com> wrote:
>
>> Hi Paulo,
>> Thanks a lot for detailed explanation. Our usecase is that, when one node
>> goes down, a new node in the same AZ comes up immediately (5 to 10 mins)
>> and it is safe to assume that no other nodes in another AZ are down at this
>> point of time. So based on your explanation, using
>> -Dcassandra.consistent.rangemovement=false seems like the way to go for our
>> usecase. I will test it with that option.
>>
>> Thanks again.
>>
>> Praveen
>>
>>
>>
>> From: Paulo Motta <pa...@gmail.com>
>> Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
>> Date: Wednesday, March 30, 2016 at 10:55 AM
>> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
>> Subject: Re: auto_boorstrap when a node is down
>>
>> When you add a node it will take over the range of an existing node, and
>> thus it should stream data from it to maintain consistency. If the existing
>> node is unavailable, the new node may fetch the data from a different
>> replica, which may not have some of data from the node which you are taking
>> the range for, what may break consistency.
>>
>> For example, imagine a ring with nodes A, B and C, RF=3. The row X=1 maps
>> to node A and is replicated in nodes B and C, so the initial arrangement
>> will be:
>>
>> A(X=1), B(X=1) and C(X=1)
>>
>> Node B is down and you write X=2 to A, which replicates the data only to
>> C since B is down (and hinted handoff is disabled). The write succeeds at
>> QUORUM. The new arragement becomes:
>>
>> A(X=2), B(X=1), C(X=2) (if any of the nodes is down, any read at QUORUM
>> will fetch the correct value of X=2)
>>
>> Now imagine you add a new node D between A and B. If D streams data from
>> A, the new replica group will become:
>>
>> A, D(X=2), B(X=1), C(X=2) (if any of the nodes is down, any read at
>> QUORUM will fetch the correct value of X=2)
>>
>> But if A is down when you bootstrap D and you have
>> -Dcassandra.consistent.rangemovement=false, D may stream data from B, so
>> the new replica group will be:
>>
>> A, D(X=1), B(X=1), C(X=2)
>>
>> Now, if C becomes down, reads at QUORUM will succeed but return the stale
>> value of X=1, so consistency is broken.
>>
>> If you're continuously running repair, have hinted handoff and read
>> repair enabled, the probability of something like this happening will
>> decrease, but it may still happen. If this is not a problem you may use
>> option -Dcassandra.consistent.rangemovement=false to bootstrap a node when
>> another node is down. See CASSANDRA-2434 for more background.
>>
>> 2016-03-30 11:14 GMT-03:00 Peddi, Praveen <pe...@amazon.com>:
>>
>>> Hello all,
>>> We just upgraded to 2.2.4 (from 2.0.9) and we noticed one issue when new
>>> nodes are added. When we add a new node when no nodes are down in the
>>> cluster, everything works fine but when we add new node while 1 node is
>>> down, I am seeing following error. My understanding was when auto_bootstrap
>>> is enabled, bootstrapping process uses QUORUM consistency so it should work
>>> when one node is down. Is that not correct? Is there a way to add a new
>>> node with bootstrapping, but not using replace address option? We use auto
>>> scaling and new node gets added automatically when one node goes down and
>>> since its all scripted I can’t use replace address in cassandra-env.sh file
>>> as a one-time option.
>>>
>>> One fallback mechanism we could use is to disable auto bootstrap and let
>>> read repairs populate the data over time but its not ideal. Is this even a
>>> good alternative to this failure?
>>>
>>> ERROR 20:30:45 Exception encountered during startup
>>> java.lang.RuntimeException: A node required to move the data
>>> consistently is down (/xx.xx.xx.xx). If you wish to move the data from a
>>> potentially inconsistent replica, restart the node with
>>> -Dcassandra.consistent.rangemovement=false
>>>
>>> Praveen
>>>
>>
>>
>

Re: auto_boorstrap when a node is down

Posted by "Peddi, Praveen" <pe...@amazon.com>.

Hi Carlos,
In our case, old node is dead and is not accessible. So I am not sure if we can use rsync in this case.

Praveen

From: Carlos Alonso <in...@mrcalonso.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Thursday, March 31, 2016 at 10:31 AM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: Re: auto_boorstrap when a node is down

If that's your use case I've developed a quick disk based replacement procedure.

Basically all it involves is rsyncing the data from the old node to the new node and bring the new one as if it was the old one (only the IP will change). Step by step details here: http://mrcalonso.com/cassandra-instantaneous-in-place-node-replacement/

We've been using it for a while and works nicely and avoids the time, resources and baby-sitting consumption of streaming data across nodes.

Regards

Carlos Alonso | Software Engineer | @calonso<https://twitter.com/calonso>

On 31 March 2016 at 15:26, Peddi, Praveen <pe...@amazon.com>> wrote:
Hi Paulo,
Thanks a lot for detailed explanation. Our usecase is that, when one node goes down, a new node in the same AZ comes up immediately (5 to 10 mins) and it is safe to assume that no other nodes in another AZ are down at this point of time. So based on your explanation, using -Dcassandra.consistent.rangemovement=false seems like the way to go for our usecase. I will test it with that option.

Thanks again.

Praveen

From: Paulo Motta <pa...@gmail.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Wednesday, March 30, 2016 at 10:55 AM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: Re: auto_boorstrap when a node is down

When you add a node it will take over the range of an existing node, and thus it should stream data from it to maintain consistency. If the existing node is unavailable, the new node may fetch the data from a different replica, which may not have some of data from the node which you are taking the range for, what may break consistency.

For example, imagine a ring with nodes A, B and C, RF=3. The row X=1 maps to node A and is replicated in nodes B and C, so the initial arrangement will be:

A(X=1), B(X=1) and C(X=1)

Node B is down and you write X=2 to A, which replicates the data only to C since B is down (and hinted handoff is disabled). The write succeeds at QUORUM. The new arragement becomes:

A(X=2), B(X=1), C(X=2) (if any of the nodes is down, any read at QUORUM will fetch the correct value of X=2)

Now imagine you add a new node D between A and B. If D streams data from A, the new replica group will become:

A, D(X=2), B(X=1), C(X=2) (if any of the nodes is down, any read at QUORUM will fetch the correct value of X=2)

But if A is down when you bootstrap D and you have -Dcassandra.consistent.rangemovement=false, D may stream data from B, so the new replica group will be:

A, D(X=1), B(X=1), C(X=2)

Now, if C becomes down, reads at QUORUM will succeed but return the stale value of X=1, so consistency is broken.

If you're continuously running repair, have hinted handoff and read repair enabled, the probability of something like this happening will decrease, but it may still happen. If this is not a problem you may use option -Dcassandra.consistent.rangemovement=false to bootstrap a node when another node is down. See CASSANDRA-2434 for more background.

2016-03-30 11:14 GMT-03:00 Peddi, Praveen <pe...@amazon.com>>:
Hello all,
We just upgraded to 2.2.4 (from 2.0.9) and we noticed one issue when new nodes are added. When we add a new node when no nodes are down in the cluster, everything works fine but when we add new node while 1 node is down, I am seeing following error. My understanding was when auto_bootstrap is enabled, bootstrapping process uses QUORUM consistency so it should work when one node is down. Is that not correct? Is there a way to add a new node with bootstrapping, but not using replace address option? We use auto scaling and new node gets added automatically when one node goes down and since its all scripted I can't use replace address in cassandra-env.sh file as a one-time option.

One fallback mechanism we could use is to disable auto bootstrap and let read repairs populate the data over time but its not ideal. Is this even a good alternative to this failure?

ERROR 20:30:45 Exception encountered during startup
java.lang.RuntimeException: A node required to move the data consistently is down (/xx.xx.xx.xx). If you wish to move the data from a potentially inconsistent replica, restart the node with -Dcassandra.consistent.rangemovement=false

Praveen

Re: auto_boorstrap when a node is down

Posted by Carlos Alonso <in...@mrcalonso.com>.

If that's your use case I've developed a quick disk based replacement
procedure.

Basically all it involves is rsyncing the data from the old node to the new
node and bring the new one as if it was the old one (only the IP will
change). Step by step details here:
http://mrcalonso.com/cassandra-instantaneous-in-place-node-replacement/

We've been using it for a while and works nicely and avoids the time,
resources and baby-sitting consumption of streaming data across nodes.

Regards

Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso>

On 31 March 2016 at 15:26, Peddi, Praveen <pe...@amazon.com> wrote:

> Hi Paulo,
> Thanks a lot for detailed explanation. Our usecase is that, when one node
> goes down, a new node in the same AZ comes up immediately (5 to 10 mins)
> and it is safe to assume that no other nodes in another AZ are down at this
> point of time. So based on your explanation, using
> -Dcassandra.consistent.rangemovement=false seems like the way to go for our
> usecase. I will test it with that option.
>
> Thanks again.
>
> Praveen
>
>
>
> From: Paulo Motta <pa...@gmail.com>
> Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
> Date: Wednesday, March 30, 2016 at 10:55 AM
> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
> Subject: Re: auto_boorstrap when a node is down
>
> When you add a node it will take over the range of an existing node, and
> thus it should stream data from it to maintain consistency. If the existing
> node is unavailable, the new node may fetch the data from a different
> replica, which may not have some of data from the node which you are taking
> the range for, what may break consistency.
>
> For example, imagine a ring with nodes A, B and C, RF=3. The row X=1 maps
> to node A and is replicated in nodes B and C, so the initial arrangement
> will be:
>
> A(X=1), B(X=1) and C(X=1)
>
> Node B is down and you write X=2 to A, which replicates the data only to C
> since B is down (and hinted handoff is disabled). The write succeeds at
> QUORUM. The new arragement becomes:
>
> A(X=2), B(X=1), C(X=2) (if any of the nodes is down, any read at QUORUM
> will fetch the correct value of X=2)
>
> Now imagine you add a new node D between A and B. If D streams data from
> A, the new replica group will become:
>
> A, D(X=2), B(X=1), C(X=2) (if any of the nodes is down, any read at QUORUM
> will fetch the correct value of X=2)
>
> But if A is down when you bootstrap D and you have
> -Dcassandra.consistent.rangemovement=false, D may stream data from B, so
> the new replica group will be:
>
> A, D(X=1), B(X=1), C(X=2)
>
> Now, if C becomes down, reads at QUORUM will succeed but return the stale
> value of X=1, so consistency is broken.
>
> If you're continuously running repair, have hinted handoff and read repair
> enabled, the probability of something like this happening will decrease,
> but it may still happen. If this is not a problem you may use option
> -Dcassandra.consistent.rangemovement=false to bootstrap a node when another
> node is down. See CASSANDRA-2434 for more background.
>
> 2016-03-30 11:14 GMT-03:00 Peddi, Praveen <pe...@amazon.com>:
>
>> Hello all,
>> We just upgraded to 2.2.4 (from 2.0.9) and we noticed one issue when new
>> nodes are added. When we add a new node when no nodes are down in the
>> cluster, everything works fine but when we add new node while 1 node is
>> down, I am seeing following error. My understanding was when auto_bootstrap
>> is enabled, bootstrapping process uses QUORUM consistency so it should work
>> when one node is down. Is that not correct? Is there a way to add a new
>> node with bootstrapping, but not using replace address option? We use auto
>> scaling and new node gets added automatically when one node goes down and
>> since its all scripted I can’t use replace address in cassandra-env.sh file
>> as a one-time option.
>>
>> One fallback mechanism we could use is to disable auto bootstrap and let
>> read repairs populate the data over time but its not ideal. Is this even a
>> good alternative to this failure?
>>
>> ERROR 20:30:45 Exception encountered during startup
>> java.lang.RuntimeException: A node required to move the data consistently
>> is down (/xx.xx.xx.xx). If you wish to move the data from a potentially
>> inconsistent replica, restart the node with
>> -Dcassandra.consistent.rangemovement=false
>>
>> Praveen
>>
>
>

Re: auto_boorstrap when a node is down

Posted by "Peddi, Praveen" <pe...@amazon.com>.

Hi Paulo,
Thanks a lot for detailed explanation. Our usecase is that, when one node goes down, a new node in the same AZ comes up immediately (5 to 10 mins) and it is safe to assume that no other nodes in another AZ are down at this point of time. So based on your explanation, using -Dcassandra.consistent.rangemovement=false seems like the way to go for our usecase. I will test it with that option.

Thanks again.

Praveen





From: Paulo Motta <pa...@gmail.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Wednesday, March 30, 2016 at 10:55 AM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: Re: auto_boorstrap when a node is down

When you add a node it will take over the range of an existing node, and thus it should stream data from it to maintain consistency. If the existing node is unavailable, the new node may fetch the data from a different replica, which may not have some of data from the node which you are taking the range for, what may break consistency.

For example, imagine a ring with nodes A, B and C, RF=3. The row X=1 maps to node A and is replicated in nodes B and C, so the initial arrangement will be:

A(X=1), B(X=1) and C(X=1)

Node B is down and you write X=2 to A, which replicates the data only to C since B is down (and hinted handoff is disabled). The write succeeds at QUORUM. The new arragement becomes:

A(X=2), B(X=1), C(X=2) (if any of the nodes is down, any read at QUORUM will fetch the correct value of X=2)

Now imagine you add a new node D between A and B. If D streams data from A, the new replica group will become:

A, D(X=2), B(X=1), C(X=2) (if any of the nodes is down, any read at QUORUM will fetch the correct value of X=2)

But if A is down when you bootstrap D and you have -Dcassandra.consistent.rangemovement=false, D may stream data from B, so the new replica group will be:

A, D(X=1), B(X=1), C(X=2)

Now, if C becomes down, reads at QUORUM will succeed but return the stale value of X=1, so consistency is broken.

If you're continuously running repair, have hinted handoff and read repair enabled, the probability of something like this happening will decrease, but it may still happen. If this is not a problem you may use option -Dcassandra.consistent.rangemovement=false to bootstrap a node when another node is down. See CASSANDRA-2434 for more background.

2016-03-30 11:14 GMT-03:00 Peddi, Praveen <pe...@amazon.com>>:
Hello all,
We just upgraded to 2.2.4 (from 2.0.9) and we noticed one issue when new nodes are added. When we add a new node when no nodes are down in the cluster, everything works fine but when we add new node while 1 node is down, I am seeing following error. My understanding was when auto_bootstrap is enabled, bootstrapping process uses QUORUM consistency so it should work when one node is down. Is that not correct? Is there a way to add a new node with bootstrapping, but not using replace address option? We use auto scaling and new node gets added automatically when one node goes down and since its all scripted I can't use replace address in cassandra-env.sh file as a one-time option.

One fallback mechanism we could use is to disable auto bootstrap and let read repairs populate the data over time but its not ideal. Is this even a good alternative to this failure?

ERROR 20:30:45 Exception encountered during startup
java.lang.RuntimeException: A node required to move the data consistently is down (/xx.xx.xx.xx). If you wish to move the data from a potentially inconsistent replica, restart the node with -Dcassandra.consistent.rangemovement=false

Praveen

Re: auto_boorstrap when a node is down

Posted by Paulo Motta <pa...@gmail.com>.

When you add a node it will take over the range of an existing node, and
thus it should stream data from it to maintain consistency. If the existing
node is unavailable, the new node may fetch the data from a different
replica, which may not have some of data from the node which you are taking
the range for, what may break consistency.

For example, imagine a ring with nodes A, B and C, RF=3. The row X=1 maps
to node A and is replicated in nodes B and C, so the initial arrangement
will be:

A(X=1), B(X=1) and C(X=1)

Node B is down and you write X=2 to A, which replicates the data only to C
since B is down (and hinted handoff is disabled). The write succeeds at
QUORUM. The new arragement becomes:

A(X=2), B(X=1), C(X=2) (if any of the nodes is down, any read at QUORUM
will fetch the correct value of X=2)

Now imagine you add a new node D between A and B. If D streams data from A,
the new replica group will become:

A, D(X=2), B(X=1), C(X=2) (if any of the nodes is down, any read at QUORUM
will fetch the correct value of X=2)

But if A is down when you bootstrap D and you have
-Dcassandra.consistent.rangemovement=false, D may stream data from B, so
the new replica group will be:

A, D(X=1), B(X=1), C(X=2)

Now, if C becomes down, reads at QUORUM will succeed but return the stale
value of X=1, so consistency is broken.

If you're continuously running repair, have hinted handoff and read repair
enabled, the probability of something like this happening will decrease,
but it may still happen. If this is not a problem you may use option
-Dcassandra.consistent.rangemovement=false to bootstrap a node when another
node is down. See CASSANDRA-2434 for more background.

2016-03-30 11:14 GMT-03:00 Peddi, Praveen <pe...@amazon.com>:

> Hello all,
> We just upgraded to 2.2.4 (from 2.0.9) and we noticed one issue when new
> nodes are added. When we add a new node when no nodes are down in the
> cluster, everything works fine but when we add new node while 1 node is
> down, I am seeing following error. My understanding was when auto_bootstrap
> is enabled, bootstrapping process uses QUORUM consistency so it should work
> when one node is down. Is that not correct? Is there a way to add a new
> node with bootstrapping, but not using replace address option? We use auto
> scaling and new node gets added automatically when one node goes down and
> since its all scripted I can’t use replace address in cassandra-env.sh file
> as a one-time option.
>
> One fallback mechanism we could use is to disable auto bootstrap and let
> read repairs populate the data over time but its not ideal. Is this even a
> good alternative to this failure?
>
> ERROR 20:30:45 Exception encountered during startup
> java.lang.RuntimeException: A node required to move the data consistently
> is down (/xx.xx.xx.xx). If you wish to move the data from a potentially
> inconsistent replica, restart the node with
> -Dcassandra.consistent.rangemovement=false
>
> Praveen
>