You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Joseph Clay (Jira)" <ji...@apache.org> on 2021/10/07 07:08:00 UTC

[jira] [Commented] (CASSANDRA-16518) Node restart during joining sets protocol version to V3

    [ https://issues.apache.org/jira/browse/CASSANDRA-16518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17425372#comment-17425372 ] 

Joseph Clay commented on CASSANDRA-16518:
-----------------------------------------

I was finally able to reproduce. So the issue only reproduces if you configure cassandra so that it uses preferred_ip. You need to set cassandra so that it has two ip address per node and broadcast address a different ip to listen address.

Here is how i reproduced it in CCM:
{noformat}
# Set ccm directory here
CCM_DIR=~/.ccm

# create cluster but don't start it
ccm create test -v 3.11.11 -n 3

# Configure cluster to use GossipingPropertyFileSnitch with prefer_local and different ips for broadcast and listen addresses
rm $CCM_DIR/test/node*/conf/cassandra-topology.properties
sed -i 's/# prefer_local=true/prefer_local=true/' $CCM_DIR/test/node*/conf/cassandra-rackdc.properties
sed -i 's/endpoint_snitch.*/endpoint_snitch: GossipingPropertyFileSnitch/' $CCM_DIR/test/node*/conf/cassandra.yaml
sed -i 's/  - seeds:.*/  - seeds: 127.0.1.1/' $CCM_DIR/test/node*/conf/cassandra.yaml
echo 'broadcast_address: 127.0.1.1' >> $CCM_DIR/test/node1/conf/cassandra.yaml
echo 'broadcast_address: 127.0.1.2' >> $CCM_DIR/test/node2/conf/cassandra.yaml
echo 'broadcast_address: 127.0.1.3' >> $CCM_DIR/test/node3/conf/cassandra.yaml
echo 'listen_on_broadcast_address: true' >> $CCM_DIR/test/node1/conf/cassandra.yaml
echo 'listen_on_broadcast_address: true' >> $CCM_DIR/test/node2/conf/cassandra.yaml
echo 'listen_on_broadcast_address: true' >> $CCM_DIR/test/node3/conf/cassandra.yaml

# Start nodes1 and 2 need to skip wait as the above config breaks it
ccm node1 start --skip-wait-other-notice
ccm node2 start --skip-wait-other-notice

# Wait for both nodes to be UN then put roughly a GB of data into the cluster
ccm stress write n=5000000

# Slow down streaming so joining takes long enough to see issue then start 3rd node
ccm node1 nodetool setstreamthroughput 1
ccm node2 nodetool setstreamthroughput 1
sed -i 's/auto_bootstrap.*/auto_bootstrap: true/' $CCM_DIR/test/node3/conf/cassandra.yaml
ccm node3 start --skip-wait-other-notice

# Wait for streaming to start then check which node that node3 is streaming off
ccm node3 nodetool netstats
# On my cluster node1 was streaming so i stop & started node 2
ccm node2 stop
ccm node2 start --skip-wait-other-notice{noformat}
After that the restarted node logged this:
INFO [main] 2021-10-07 17:54:40,637 ConfiguredLimit.java:108 - Detected peers which do not fully support protocol V4. Capping max negotiable version to V3

Also if you tried to cqlsh to the restarted node:
Connection error: ('Unable to connect to any servers', \{'127.0.0.2:9042': DriverException('ProtocolError returned from server while using explicitly set client protocol_version 4')})

Errors in the node logs corresponding to attempted connection:
WARN [epollEventLoopGroup-2-6] 2021-10-07 17:56:04,822 NoSpamLogger.java:94 - Protocol exception with client networking: org.apache.cassandra.transport.ProtocolException: Invalid or unsupported protocol version (4); supported versions are (3/v3, 4/v4, 5/v5-beta)

You can cqlsh if you force the protocol version: cqlsh 127.0.0.2 --protocol-version=3

Peers table:
{noformat}
 peer      | data_center | host_id                              | preferred_ip | rack  | release_version | rpc_address | schema_version                       | tokens
-----------+-------------+--------------------------------------+--------------+-------+-----------------+-------------+--------------------------------------+--------------------------
 127.0.1.1 |         dc1 | d4d86d82-d12b-4bae-a1a4-7f0ded9d9b79 |    127.0.0.1 | rack1 |         3.11.11 |   127.0.0.1 | d25c2da9-b17e-3aba-8682-6cf063aaca51 | {'-9223372036854775808'}

 127.0.1.3 |        null |                                 null |    127.0.0.3 |  null |            null |        null |                                 null |                     null{noformat}
As you can see node3 the joining node has all nulls except for peer and preferred_ip columns. I believe the null in the release_version column causes the version check to fail.

 

 

> Node restart during joining sets protocol version to V3
> -------------------------------------------------------
>
>                 Key: CASSANDRA-16518
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16518
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Messaging/Client
>            Reporter: Joseph Clay
>            Assignee: Stefan Miklosovic
>            Priority: Normal
>             Fix For: 3.11.x
>
>
> While joining nodes to a cluster, an old node crashed. The old node was recovered however clients (datastax java) refused to connect to it.
> The driver error:
> {noformat}
> Detected added or restarted Cassandra host /<ip>:<port> but ignoring it since it does not support the version V4 of the native protocol which is currently in use.{noformat}
> In the recovered node cassandra logs:
> {noformat}
> INFO  o.a.c.transport.ConfiguredLimit Detected peers which do not fully support protocol V4. Capping max negotiable version to V3{noformat}
> I confirmed that ALL the nodes in the cluster, joining or otherwise, were apache-cassandra-3.11.6 so that error message was rather confusing.
>  Eventually after digging through the code we got to the bottom of the issue:
> https://issues.apache.org/jira/browse/CASSANDRA-15193 adds a check for node version, which reverts the protocol version to V3 if any peer fails the version check. Joining nodes have NULL for their version in the peers table, which fails the version check.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org