You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Aswin Karthik (Jira)" <ji...@apache.org> on 2022/11/16 09:18:00 UTC

[jira] [Created] (CASSANDRA-18053) Node disconnection during cassandra 4.0 upgrade from cassandra 3.11

Aswin Karthik created CASSANDRA-18053:
-----------------------------------------

             Summary: Node disconnection during cassandra 4.0 upgrade from cassandra 3.11
                 Key: CASSANDRA-18053
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18053
             Project: Cassandra
          Issue Type: Bug
            Reporter: Aswin Karthik


We are running Cassandra 3.11.11. We are upgrading to 4.0.5.

The nodes use 11044 for its storage port.

 

Our upgrade process is the usual
 * Boot cassandra 4.0.5 using 3.11.11 data disk
 * Run upgradesstables

 

However, during the upgrade, randomly a node is unable to connect to other nodes in the cluster. This happens very intermittently and gets fixed on restart.

 

On further diagnosis, we found that the problematic node uses 7000 from some communication instead of the configured port

 
{noformat}
 InboundConnectionInitiator.java:127 - Listening on address: (node-1.dev/x.x.x.x:11044), nic: eth0, encryption: optionally encrypted(openssl)
OutboundConnection.java:1150 - node-1.dev/x.x.x.x:7000(/x.x.x.x:50424)->/y.y.y.y:11044-URGENT_MESSAGES-3c193918 successfully connected, version = 12, framing = LZ4, encryption = encryptedfactory=openssl;protocol=TLSv1.2;cipher=TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384){noformat}
Notice the x.x.x.x:7000 in log line even though x.x.x.x is starting on 11044.

This gets fixed on restart.

 

The logs on reboot
{noformat}
 InboundConnectionInitiator.java:127 - Listening on address: (/x.x.x.x:11044), nic: eth0, encryption: optionally encrypted(openssl)
InboundConnectionInitiator.java:464 - /y.y.y.y:11044(/y.y.y.y:40656)->/x.x.x.x:11044-URGENT_MESSAGES-cade4755 messaging connection established, version = 12, framing = CRC, encryption = encrypted(factory=openssl;protocol=TLSv1.2;cipher=TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384)
OutboundConnection.java:1150 - /x.x.x.x:11044(/x.x.x.x:53316)->/y.y.y.y:11044-URGENT_MESSAGES-92d99f23 successfully connected, version = 12, framing = LZ4, encryption = encrypted(factory=openssl;protocol=TLSv1.2;cipher=TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384)
 {noformat}
 

Notice the Outbound connection log line has x.x.x.x:11044 this time.

 

This issue is very random.

 

Looks to be a bug. Is there a fix for this? Are we missing some steps during the upgrade?

 

Some relevant sections of cassandra.yaml on both the cassandra 3.x and 4.x

 
{noformat}
storage_port: 11044
ssl_storage_port: 11044
server_encryption_options:
    internode_encryption: all
    keystore: ---------
    keystore_password: -------
    truststore: ---------
    truststore_password: ---------
    protocol: TLSv1.2
    algorithm: PKIX
    store_type: PKCS12
    cipher_suites:
        - TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
        - TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
    require_client_auth: true {noformat}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org