You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Carl Mueller <ca...@smartthings.com.INVALID> on 2019/06/11 20:11:03 UTC

postmortem on 2.2.13 scale out difficulties

We had a three-DC (asia-tokyo/europe/us) cassandra 2.2.13 cluster, AWS, IPV6

Needed to scale out the asia datacenter, which was 5 nodes, europe and us
were 25 nodes

We were running into bootstrapping issues where the new node failed to
bootstrap/stream, it failed with

"java.lang.RuntimeException: A node required to move the data consistently
is down"

...even though they were all up based on nodetool status prior to adding
the node.

First we increased the phi_convict_threshold to 12, and that did not help.

CASSANDRA-12281 appeared similar to what we had problems with, but I don't
think we hit that. Somewhere in there someone wrote

"For us, the workaround is either deleting the data (then bootstrap again),
or increasing the ring_delay_ms. And the larger the cluster is, the longer
ring_delay_ms is needed. Based on our tests, for a 40 nodes cluster, it
requires ring_delay_ms to be >50seconds. For a 70 nodes cluster,
>100seconds. Default is 30seconds."

Given the WAN nature or our DCs, we used ring_delay_ms to 100 seconds and
it finally worked.

side note:

During the rolling restarts for setting phi_convict_threshold we observed
quite a lot of status map variance between nodes (we have a program to poll
all of a datacenter or cluster's view of the gossipinfo and statuses. AWS
appears to have variance in networking based on the phi_convict_threshold
advice, I'm not sure if our difficulties were typical in that regard and/or
if our IPV6 and/or globally distributed datacenters were exacerbating
factors.

We could not reproduce this in loadtest, although loadtest is only eu and
us (but is IPV6)

Re: postmortem on 2.2.13 scale out difficulties

Posted by Carl Mueller <ca...@smartthings.com.INVALID>.

We're getting

DEBUG [GossipStage:1] 2019-06-12 15:20:07,797 MigrationManager.java:96 -
Not pulling schema because versions match or shouldPullSchemaFrom returned
false

multiple times, as it contacts the nodes.

On Wed, Jun 12, 2019 at 11:35 AM Carl Mueller <ca...@smartthings.com>
wrote:

> We only were able to scale out four nodes and then failures started
> occurring, including multiple instances of nodes joining a cluster without
> streaming.
>
> Sigh.
>
> On Tue, Jun 11, 2019 at 3:11 PM Carl Mueller <ca...@smartthings.com>
> wrote:
>
>> We had a three-DC (asia-tokyo/europe/us) cassandra 2.2.13 cluster, AWS,
>> IPV6
>>
>> Needed to scale out the asia datacenter, which was 5 nodes, europe and us
>> were 25 nodes
>>
>> We were running into bootstrapping issues where the new node failed to
>> bootstrap/stream, it failed with
>>
>> "java.lang.RuntimeException: A node required to move the data
>> consistently is down"
>>
>> ...even though they were all up based on nodetool status prior to adding
>> the node.
>>
>> First we increased the phi_convict_threshold to 12, and that did not
>> help.
>>
>> CASSANDRA-12281 appeared similar to what we had problems with, but I
>> don't think we hit that. Somewhere in there someone wrote
>>
>> "For us, the workaround is either deleting the data (then bootstrap
>> again), or increasing the ring_delay_ms. And the larger the cluster is, the
>> longer ring_delay_ms is needed. Based on our tests, for a 40 nodes cluster,
>> it requires ring_delay_ms to be >50seconds. For a 70 nodes cluster,
>> >100seconds. Default is 30seconds."
>>
>> Given the WAN nature or our DCs, we used ring_delay_ms to 100 seconds and
>> it finally worked.
>>
>> side note:
>>
>> During the rolling restarts for setting phi_convict_threshold we observed
>> quite a lot of status map variance between nodes (we have a program to poll
>> all of a datacenter or cluster's view of the gossipinfo and statuses. AWS
>> appears to have variance in networking based on the phi_convict_threshold
>> advice, I'm not sure if our difficulties were typical in that regard and/or
>> if our IPV6 and/or globally distributed datacenters were exacerbating
>> factors.
>>
>> We could not reproduce this in loadtest, although loadtest is only eu and
>> us (but is IPV6)
>>
>

Re: postmortem on 2.2.13 scale out difficulties

Posted by Carl Mueller <ca...@smartthings.com.INVALID>.

We only were able to scale out four nodes and then failures started
occurring, including multiple instances of nodes joining a cluster without
streaming.

Sigh.

On Tue, Jun 11, 2019 at 3:11 PM Carl Mueller <ca...@smartthings.com>
wrote:

> We had a three-DC (asia-tokyo/europe/us) cassandra 2.2.13 cluster, AWS,
> IPV6
>
> Needed to scale out the asia datacenter, which was 5 nodes, europe and us
> were 25 nodes
>
> We were running into bootstrapping issues where the new node failed to
> bootstrap/stream, it failed with
>
> "java.lang.RuntimeException: A node required to move the data consistently
> is down"
>
> ...even though they were all up based on nodetool status prior to adding
> the node.
>
> First we increased the phi_convict_threshold to 12, and that did not help.
>
> CASSANDRA-12281 appeared similar to what we had problems with, but I don't
> think we hit that. Somewhere in there someone wrote
>
> "For us, the workaround is either deleting the data (then bootstrap
> again), or increasing the ring_delay_ms. And the larger the cluster is, the
> longer ring_delay_ms is needed. Based on our tests, for a 40 nodes cluster,
> it requires ring_delay_ms to be >50seconds. For a 70 nodes cluster,
> >100seconds. Default is 30seconds."
>
> Given the WAN nature or our DCs, we used ring_delay_ms to 100 seconds and
> it finally worked.
>
> side note:
>
> During the rolling restarts for setting phi_convict_threshold we observed
> quite a lot of status map variance between nodes (we have a program to poll
> all of a datacenter or cluster's view of the gossipinfo and statuses. AWS
> appears to have variance in networking based on the phi_convict_threshold
> advice, I'm not sure if our difficulties were typical in that regard and/or
> if our IPV6 and/or globally distributed datacenters were exacerbating
> factors.
>
> We could not reproduce this in loadtest, although loadtest is only eu and
> us (but is IPV6)
>