You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Berenguer Blasi (Jira)" <ji...@apache.org> on 2021/02/01 11:29:00 UTC

[jira] [Commented] (CASSANDRA-16408) Unable to bootstrap/join new nodes to existing 4.0 cluster

    [ https://issues.apache.org/jira/browse/CASSANDRA-16408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17276270#comment-17276270 ] 

Berenguer Blasi commented on CASSANDRA-16408:
---------------------------------------------

I'm 1 step behind you as I just started on this one. So far I can't repro locally

{noformat}
Datacenter: testdc1
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens  Owns (effective)  Host ID                               Rack
UN  127.0.0.4  68.54 KiB  1       ?                 8850c29c-2b65-4162-ada5-53fcd6551b9a  RAC1
UN  127.0.0.3  73.59 KiB  1       ?                 ce0d1235-7704-4679-9df4-dd4621e8dafe  RAC3
UN  127.0.0.2  68.55 KiB  1       ?                 4199bb8e-1a5e-49cf-99f3-ebbadceef91c  RAC2
UN  127.0.0.1  68.58 KiB  1       ?                 49a0f995-4c9b-484b-a15c-515986049951  RAC1
{noformat}

But knowing what I know from other bootstrap bugs I've done recently I'd bet this is on the EC2Snitch and we'd need to try repro this on a real AWS cluster.

> Unable to bootstrap/join new nodes to existing 4.0 cluster
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-16408
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16408
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Cluster/Membership, Consistency/Bootstrap and Decommission
>            Reporter: J Hickey
>            Assignee: Berenguer Blasi
>            Priority: Normal
>             Fix For: 4.0-beta
>
>
> Trying to add a new node to an existing 4.0 cluster gets stuck in bootstrap/joining permanently with no clear error.
> Version: 4.0-beta4 (issue also seen in 4.0-beta3, and NOT seen in 3.11.x) and Java 8 (Open JDK 1.8.0_275)
> Topology: 3 rack single DC using EC2Snitch, 1 seed node per rack
> Relevant cassandra.yaml settings: 
> {code}
> auto_bootstrap: true (implicit)
> seeds contains the same 3 nodes on all nodes
> num_tokens: 16
> allocate_tokens_for_local_replication_factor: 3
> server_encryption_options.internode_encryption: all
> server_encryption_options.enabled: true
> server_encryption_options.optional: false
> server_encryption_options.require_client_auth: true
> client_encryption_options.enabled: true
> client_encryption_options.optional: false
> client_encryption_options.require_client_auth: true
> {code}
> Scenario: 
> * Bring up the 3 seed nodes to create a new cluster. 
> * Add a user keyspace: create keyspace test with replication = \{ 'class': 'NetworkTopologyStrategy', 'us-east-1-dc': 3 }; and insert some test data. 
> * Wait at least 10 minutes after the initial 3 seed nodes come up (nodes will join if they are brought up at the same time as the seeds, but not if they are brought up later). 
> * Start cassandra on a fourth node. 
> Cassandra begins to bootstrap but does not ever finish (I have left this running overnight) and does not exit nor log any errors. Nodetool status from any node shows new node as UJ. Nodetool netstats from new node shows receiving file from test keyspace at 100% received. Logs show bootstrap starting and streaming starting, but then nothing/no errors.
> Worth noting here that I have also tried this with allocate_tokens_for_local_replication_factor disabled and still have this issue. I have also tried this without any user keyspace/data, just completely empty cluster and still have this issue. The only way I seem to be able to bring up a decently sized cluster on 4.0 is to disable allocate_tokens_for_local_replication_factor (to avoid collisions as mentioned in other issues) and bring up all nodes at about the same time, or use auto_bootstrap: false. I have no issue adding a new node in a similar fashion to a 3.11.x cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org