You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Michael Carlise <mc...@salesforce.com.INVALID> on 2019/08/26 20:55:48 UTC

unable to gossip with peers exception when internode encryption is set to any setting other than 'none'

I originally opened this issue on stackoverflow (
https://stackoverflow.com/questions/57516660/cassandra-node-to-node-encryption-throws-unable-to-gossip-with-peers-exception
).

However, I haven't gotten any responses in over a week.  I'm going to post
it here and maybe someone will have an idea on where I can look.

We currently run a multi region cassandra cluster in AWS. It runs in four
regions, 12 nodes per region. It runs without node to node encryption (or
client encryption either). We are trying to enable inter datacenter node to
node encryption. However, when we flip encryption over we get an exception
that nodes are unable to gossip with any peers.

It could possibly be that we didn't build our jks keystore/truststores
correctly (more on how we built these files below). But, we additionally do
not see intra datacenter communication working (which should be set to
unencrypted communication). Additionally, cqlsh cannot connect to the node
either; even though we have (by default) client_auth_required set to false.

ERROR [main] 2019-08-15 18:46:32,241 CassandraDaemon.java:749 -
Exception encountered during startup
java.lang.RuntimeException: Unable to gossip with any peers
        at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1435)
~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:566)
~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:823)
~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.service.StorageService.initServer(StorageService.java:683)
~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.service.StorageService.initServer(StorageService.java:632)
~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:388)
[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:620)
[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:732)
[apache-cassandra-3.11.4.jar:3.11.4]
INFO  [main] 2019-08-15 18:47:07,384 YamlConfigurationLoader.java:89 -
Configuration location: file:/etc/cassandra/cassandra.yaml


Something to note is that this error message occurs after a few minutes of
the node being up. (i.e. there is a delay between start up before this
exception is thrown).

*Information about our cassandra setup*

cassandra version: 3.11.4
JDK version: openjdk-8.
Linux: Ubuntu 18.04 (bionic).

*cassandra.yaml*

endpoint_snitch: Ec2MultiRegionSnitch

server_encryption_options:
  internode_encryption: dc
  keystore: <omitted>
  keystore_password: <omitted>
  truststore: <omitted>
  truststore_password: <omitted>

client_encryption_options:
  enabled: false

*cassandra-rackdc.properties*

prefer_local=true

*No obvious errors with SSH output*

When starting cassandra with JVM_OPTS="$JVM_OPTS -Djavax.net.debug=ssl" added
to cassandra-env.sh we see SSL logs printed to stdout (*Note: Subject and
Issuer were omitted on purpose)*.

found key for : cassy-us-west-2
adding as trusted cert:
  Subject: ...
  Issuer:  ...
  Algorithm: RSA; Serial number: 0xdad28d843fc73325d4c1a75207d4e74
  Valid from Fri May 27 00:00:00 UTC 2016 until Tue May 26 23:59:59 UTC 2026

...

trigger seeding of SecureRandom
done seeding SecureRandom

Looking at Java SE SSL/TLS connection debugging
<https://docs.oracle.com/javase/7/docs/technotes/guides/security/jsse/ReadDebug.html>,
this looks correct. But to note, we see this series of messages (along with
the RSA key signature output) repeated several times in rapid fire. We
never observe any messages about the trust store being added; however that
might be something that occurs only on client initiation (?)

Additionally, we do see cassandra report that the Encrypted Messaging
service has been started.

INFO  [main] 2019-08-15 18:45:31,022 MessagingService.java:704 -
Starting Encrypted Messaging Service on SSL port 7001

*Doesn't appear to be a cassandra.yaml configuration problem*

We can bring the node back online by simply configuring internode_encryption:
none. This action seems to rule out a broadcast_address or rpc_address
configuration problem.

*How we built our keystore/truststores*

We followed the basic template datastax docs for preparing SSL certificates
<https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/configuration/secureSSLCertWithCA.html>.
One minor difference was that our private key and CSRs were generated using
openssl. One per each region (we plan to share key/signed certs across
nodes in regions). This was created using a command template as:

openssl req -new -newkey rsa:2048 -out cassy-<region>.csr -keyout
cassy-<region>.key -config cassy-<region>.conf -subj "..." -nodes
-sha256

The generated CSR was then signed by an internal root CA. Because we
generated our files using openssl, we had to build our jks files by
importing our certs into them.

*Commands to generate truststore*

We distribute this one file to all nodes.

keytool -importcert
    -keystore generic-server-truststore.jks
    -alias rootCa
    -file rootCa.crt
    -noprompt
    -keypass omitted
    -storepass omitted

*Commands to generate keystore*

This was done one per region; but essentially we created a keystore with
keytool, then deleted the key entry and then imported our key entry using
keytool from a pkcs12 file.

keytool -genkeypair -keyalg RSA -alias cassy-${region} -keystore
cassy-${region}.jks -storepass omitted -keypass omitted -validity 365
-keysize 2048 -dname "..."

keytool -delete -alias cassy-${region} -keystore cassy-${region}.jks
-storepass omitted

openssl pkcs12 -export -in signed_certs/${region}.pem -inkey
keys/cassandra.${region}.key -name cassy-${region} -out ${region}.p12

keytool -importkeystore -deststorepass omitted -destkeystore
cassy-${region}.jks -srckeystore ${region}.p12 -srcstoretype PKCS12

keytool -importcert -keystore cassy-${region}.jks -alias rootCa -file
ca.crt -noprompt -keypass omitted -storepass omitted

Looking back at this, I don't remember why we used keytool to generate a
keypair/keystore, then deleted and imported. I think it was because the
keytool importkeystore command refused to run if the keystore didn't
already exist.

*ca.crt and pem file*

The ca.crt file contains the root certificate and the intermediate
certificate that was used to sign the CSR. The pem file contains the signed
CSR returned to us, the intermediate cert, and the root CA (in that order).

*openssl verify ca.crt and pem*

openssl verify -CAfile ca.crt us-west-2.pem
signed_certs/us-west-2.pem: OK

*Command output after enabling encryption*

*nodetool status (output truncated)*

Datacenter: us-east
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Tokens       Owns (effective)  Host ID
                             Rack
?N  52.44.11.221    ?          256          25.4%             null
                             1c
...
?N  52.204.232.195  ?          256          23.2%             null
                             1d
Datacenter: us-west-2
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Tokens       Owns (effective)  Host ID
                             Rack
?N  34.209.2.144    ?          256          26.5%             null
                             2c
UN  52.40.32.177    105.99 GiB  256          23.7%             null
                              2c
?N  34.210.109.203  ?          256          24.7%             null
                             2a
...

With the online node being the node with encryption set.

*cqlsh to localhost*

cassy-node6:~$ cqlsh
Connection error: ('Unable to connect to any servers', {'127.0.0.1':
error(111, "Tried connecting to [('127.0.0.1', 9042)]. Last error:
Connection refused")})

*cqlsh to remote node* Remote node is a node with encryption enabled

cassy-node6:~$ cqlsh 10.0.2.7
Connection error: ('Unable to connect to any servers', {'10.0.2.7':
error(111, "Tried connecting to [('10.0.2.7', 9042)]. Last error:
Connection refused")})

Re: unable to gossip with peers exception when internode encryption is set to any setting other than 'none'

Posted by Michael Carlise <mc...@salesforce.com.INVALID>.
For clarity for anybody that comes to this chain in the archive.  This
might be an issue with Ec2MultiRegionSnitch all together; not sure.  But if
I create a local 3 node cluster using ccm (cassandra v 3.11.4).  I can drop
the keystore/truststore jks files in, and flip encryption and everything
works as expected.  Tomorrow I'll reach out to the slack channel and see if
anybody can help/suggest ways to test it; or if anybody is aware of an
ongoing issue.

On Wed, Aug 28, 2019 at 2:49 PM Michael Carlise <mc...@salesforce.com>
wrote:

> telnet from node 1 -> node2 7001 (and 7000) works.
>
> However, I can't rule out a JKS keystore/truststore issue.  I have tried a
> number of configurations and none of them have seemed to help (or emit any
> further error logging).   We have a root and intermediate CA cert, and a
> private key + signed CSR.  Our keystore has a single privateKeyentry of
> length 2: consisting of the signed CSR and the intermediate cert (in that
> order).  The truststore has a single entry of length one: consisting of the
> root cert used to issue the intermediate.  Does anybody know if that is the
> correct setup for JKS.  This setup was given to us by another team in our
> company that uses java much more than us.
>
> Some other points to note: Cassandra-9386 issue points out that 'dc'
> internode_encryption when using Ec2MultiRegionSnitch doesn't work correctly
> (always uses encrypted connections).  But I still can't get 'all' to work.
> The way I'm trying to get it to work is by just simply flipping encryption
> on in two non-seed nodes in the same datacenter.  I notice that in
> system.log I can see them both output the message 'Handshaking with
> /private IP'.  But then a few minutes later the unable to gossip exception
> is thrown.  No other information/logs are given; so I assume the handshake
> failed? presumably b/c incorrect truststore/keystore?
>
> I can't seem to find any concrete information about how to setup the
> keystore cert chain and/or the truststore. Does anybody know of any good
> sources on this topic, or know at the top of the minds how this setup is
> supposed to be?
>
>
> On Mon, Aug 26, 2019 at 10:01 PM Subroto Barua <sb...@yahoo.com.invalid>
> wrote:
>
>> could be issue with keystore/trustore --- you may want to do keytool --
>> list  -- validate the files/password; also do md5sum on files from 1 node
>> in west and 1 node in east.
>> check ssl port 7001 --- from 1 node in west --> telnet <node in
>> east>:7001 (or custom port if you are not using default port)
>>
>> On Monday, August 26, 2019, 05:46:19 PM PDT, Michael Carlise
>> <mc...@salesforce.com.INVALID> wrote:
>>
>>
>> Subroto -
>>
>> both tools error; openssl errno 111 - which made me check bound ports on
>> the c* node with encryption flipped.  Port 9042 is not open (determined by
>> netstat -ant).  Looking at the log differences for when a node is started
>> with/without encryption.  Without encryption, I get a bunch of lines like:
>>
>> OutboundTcpConnection.java:561 - Handshaking version w/ IP
>>
>> And this happens after a line like
>>
>> Gossiper.java - Waiting for gossip to settle...
>>
>> with encryption toggled to 'dc', I don't see any of those lines;
>> presumable b/c the gossiper is trying to start but doesn't.
>>
>> On Mon, Aug 26, 2019 at 6:51 PM Subroto Barua <sb...@yahoo.com.invalid>
>> wrote:
>>
>> Michael,
>>
>> Are you able to connect to any c* node via OpenSSL?
>>
>> Openssl s_client -connect <ip address >:9042
>>
>> Cqlsh <ip address> —ssl
>>
>> Subroto
>>
>> On Aug 26, 2019, at 2:47 PM, Marc Selwan <ma...@datastax.com>
>> wrote:
>>
>> which exact version of OpenJDK are you using? Is it possible you don't
>> have JCE on those nodes? (I believe more recent versions of Java 8 has this
>> baked in so that might not be it)
>>
>>
>> *Marc Selwan | *DataStax *| *PM, Server Team *|* *(925) 413-7079* *|*
>> Twitter <https://twitter.com/MarcSelwan>
>>
>> *  Quick links | *DataStax <http://www.datastax.com> *| *Training
>> <http://www.academy.datastax.com> *| *Documentation
>> <http://www.datastax.com/documentation/getting_started/doc/getting_started/gettingStartedIntro_r.html>
>>  *| *Downloads <http://www.datastax.com/download>
>>
>>
>>
>> On Mon, Aug 26, 2019 at 1:56 PM Michael Carlise <
>> mcarlise@salesforce.com.invalid> wrote:
>>
>>
>> I originally opened this issue on stackoverflow (
>> https://stackoverflow.com/questions/57516660/cassandra-node-to-node-encryption-throws-unable-to-gossip-with-peers-exception
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__stackoverflow.com_questions_57516660_cassandra-2Dnode-2Dto-2Dnode-2Dencryption-2Dthrows-2Dunable-2Dto-2Dgossip-2Dwith-2Dpeers-2Dexception&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=E6NVfMr2TIhW42QMfARTvsfCLtdF-oEA3KfAQRfVZdk&m=KdhQDpMbz8v1GYrbdYL_opGq-GBPXftrEYEkgcGeMp0&s=4CR8PRQopb4FyCLj8PDI44mSouBz65Yx8THnH8tOb7o&e=>
>> ).
>>
>> However, I haven't gotten any responses in over a week.  I'm going to
>> post it here and maybe someone will have an idea on where I can look.
>>
>> We currently run a multi region cassandra cluster in AWS. It runs in four
>> regions, 12 nodes per region. It runs without node to node encryption (or
>> client encryption either). We are trying to enable inter datacenter node to
>> node encryption. However, when we flip encryption over we get an exception
>> that nodes are unable to gossip with any peers.
>>
>> It could possibly be that we didn't build our jks keystore/truststores
>> correctly (more on how we built these files below). But, we additionally do
>> not see intra datacenter communication working (which should be set to
>> unencrypted communication). Additionally, cqlsh cannot connect to the node
>> either; even though we have (by default) client_auth_required set to
>> false.
>>
>> ERROR [main] 2019-08-15 18:46:32,241 CassandraDaemon.java:749 - Exception encountered during startup
>> java.lang.RuntimeException: Unable to gossip with any peers
>>         at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1435) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:566) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:823) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.service.StorageService.initServer(StorageService.java:683) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.service.StorageService.initServer(StorageService.java:632) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:388) [apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:620) [apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:732) [apache-cassandra-3.11.4.jar:3.11.4]
>> INFO  [main] 2019-08-15 18:47:07,384 YamlConfigurationLoader.java:89 - Configuration location: file:/etc/cassandra/cassandra.yaml
>>
>>
>> Something to note is that this error message occurs after a few minutes
>> of the node being up. (i.e. there is a delay between start up before this
>> exception is thrown).
>>
>> *Information about our cassandra setup*
>>
>> cassandra version: 3.11.4
>> JDK version: openjdk-8.
>> Linux: Ubuntu 18.04 (bionic).
>>
>> *cassandra.yaml*
>>
>> endpoint_snitch: Ec2MultiRegionSnitch
>>
>> server_encryption_options:
>>   internode_encryption: dc
>>   keystore: <omitted>
>>   keystore_password: <omitted>
>>   truststore: <omitted>
>>   truststore_password: <omitted>
>>
>> client_encryption_options:
>>   enabled: false
>>
>> *cassandra-rackdc.properties*
>>
>> prefer_local=true
>>
>> *No obvious errors with SSH output*
>>
>> When starting cassandra with JVM_OPTS="$JVM_OPTS -Djavax.net.debug=ssl" added
>> to cassandra-env.sh we see SSL logs printed to stdout (*Note: Subject
>> and Issuer were omitted on purpose)*.
>>
>> found key for : cassy-us-west-2
>> adding as trusted cert:
>>   Subject: ...
>>   Issuer:  ...
>>   Algorithm: RSA; Serial number: 0xdad28d843fc73325d4c1a75207d4e74
>>   Valid from Fri May 27 00:00:00 UTC 2016 until Tue May 26 23:59:59 UTC 2026
>>
>> ...
>>
>> trigger seeding of SecureRandom
>> done seeding SecureRandom
>>
>> Looking at Java SE SSL/TLS connection debugging
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.oracle.com_javase_7_docs_technotes_guides_security_jsse_ReadDebug.html&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=E6NVfMr2TIhW42QMfARTvsfCLtdF-oEA3KfAQRfVZdk&m=KdhQDpMbz8v1GYrbdYL_opGq-GBPXftrEYEkgcGeMp0&s=SR3ashwvSRxA75nBjGDwjAwq65nDuBZUaDOvHPGDrps&e=>,
>> this looks correct. But to note, we see this series of messages (along with
>> the RSA key signature output) repeated several times in rapid fire. We
>> never observe any messages about the trust store being added; however that
>> might be something that occurs only on client initiation (?)
>>
>> Additionally, we do see cassandra report that the Encrypted Messaging
>> service has been started.
>>
>> INFO  [main] 2019-08-15 18:45:31,022 MessagingService.java:704 - Starting Encrypted Messaging Service on SSL port 7001
>>
>> *Doesn't appear to be a cassandra.yaml configuration problem*
>>
>> We can bring the node back online by simply configuring internode_encryption:
>> none. This action seems to rule out a broadcast_address or rpc_address
>> configuration problem.
>>
>> *How we built our keystore/truststores*
>>
>> We followed the basic template datastax docs for preparing SSL
>> certificates
>> <https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/configuration/secureSSLCertWithCA.html>.
>> One minor difference was that our private key and CSRs were generated using
>> openssl. One per each region (we plan to share key/signed certs across
>> nodes in regions). This was created using a command template as:
>>
>> openssl req -new -newkey rsa:2048 -out cassy-<region>.csr -keyout cassy-<region>.key -config cassy-<region>.conf -subj "..." -nodes -sha256
>>
>> The generated CSR was then signed by an internal root CA. Because we
>> generated our files using openssl, we had to build our jks files by
>> importing our certs into them.
>>
>> *Commands to generate truststore*
>>
>> We distribute this one file to all nodes.
>>
>> keytool -importcert
>>     -keystore generic-server-truststore.jks
>>     -alias rootCa
>>     -file rootCa.crt
>>     -noprompt
>>     -keypass omitted
>>     -storepass omitted
>>
>> *Commands to generate keystore*
>>
>> This was done one per region; but essentially we created a keystore with
>> keytool, then deleted the key entry and then imported our key entry using
>> keytool from a pkcs12 file.
>>
>> keytool -genkeypair -keyalg RSA -alias cassy-${region} -keystore cassy-${region}.jks -storepass omitted -keypass omitted -validity 365 -keysize 2048 -dname "..."
>>
>> keytool -delete -alias cassy-${region} -keystore cassy-${region}.jks -storepass omitted
>>
>> openssl pkcs12 -export -in signed_certs/${region}.pem -inkey keys/cassandra.${region}.key -name cassy-${region} -out ${region}.p12
>>
>> keytool -importkeystore -deststorepass omitted -destkeystore cassy-${region}.jks -srckeystore ${region}.p12 -srcstoretype PKCS12
>>
>> keytool -importcert -keystore cassy-${region}.jks -alias rootCa -file ca.crt -noprompt -keypass omitted -storepass omitted
>>
>> Looking back at this, I don't remember why we used keytool to generate a
>> keypair/keystore, then deleted and imported. I think it was because the
>> keytool importkeystore command refused to run if the keystore didn't
>> already exist.
>>
>> *ca.crt and pem file*
>>
>> The ca.crt file contains the root certificate and the intermediate
>> certificate that was used to sign the CSR. The pem file contains the signed
>> CSR returned to us, the intermediate cert, and the root CA (in that order).
>>
>> *openssl verify ca.crt and pem*
>>
>> openssl verify -CAfile ca.crt us-west-2.pem
>> signed_certs/us-west-2.pem: OK
>>
>> *Command output after enabling encryption*
>>
>> *nodetool status (output truncated)*
>>
>> Datacenter: us-east
>> ===================
>> Status=Up/Down
>> |/ State=Normal/Leaving/Joining/Moving
>> --  Address         Load       Tokens       Owns (effective)  Host ID                               Rack
>> ?N  52.44.11.221    ?          256          25.4%             null                                  1c
>> ...
>> ?N  52.204.232.195  ?          256          23.2%             null                                  1d
>> Datacenter: us-west-2
>> =====================
>> Status=Up/Down
>> |/ State=Normal/Leaving/Joining/Moving
>> --  Address         Load       Tokens       Owns (effective)  Host ID                               Rack
>> ?N  34.209.2.144    ?          256          26.5%             null                                  2c
>> UN  52.40.32.177    105.99 GiB  256          23.7%             null                                  2c
>> ?N  34.210.109.203  ?          256          24.7%             null                                  2a
>> ...
>>
>> With the online node being the node with encryption set.
>>
>> *cqlsh to localhost*
>>
>> cassy-node6:~$ cqlsh
>> Connection error: ('Unable to connect to any servers', {'127.0.0.1': error(111, "Tried connecting to [('127.0.0.1', 9042)]. Last error: Connection refused")})
>>
>> *cqlsh to remote node* Remote node is a node with encryption enabled
>>
>> cassy-node6:~$ cqlsh 10.0.2.7
>> Connection error: ('Unable to connect to any servers', {'10.0.2.7': error(111, "Tried connecting to [('10.0.2.7', 9042)]. Last error: Connection refused")})
>>
>>

Re: unable to gossip with peers exception when internode encryption is set to any setting other than 'none'

Posted by Michael Carlise <mc...@salesforce.com.INVALID>.
telnet from node 1 -> node2 7001 (and 7000) works.

However, I can't rule out a JKS keystore/truststore issue.  I have tried a
number of configurations and none of them have seemed to help (or emit any
further error logging).   We have a root and intermediate CA cert, and a
private key + signed CSR.  Our keystore has a single privateKeyentry of
length 2: consisting of the signed CSR and the intermediate cert (in that
order).  The truststore has a single entry of length one: consisting of the
root cert used to issue the intermediate.  Does anybody know if that is the
correct setup for JKS.  This setup was given to us by another team in our
company that uses java much more than us.

Some other points to note: Cassandra-9386 issue points out that 'dc'
internode_encryption when using Ec2MultiRegionSnitch doesn't work correctly
(always uses encrypted connections).  But I still can't get 'all' to work.
The way I'm trying to get it to work is by just simply flipping encryption
on in two non-seed nodes in the same datacenter.  I notice that in
system.log I can see them both output the message 'Handshaking with
/private IP'.  But then a few minutes later the unable to gossip exception
is thrown.  No other information/logs are given; so I assume the handshake
failed? presumably b/c incorrect truststore/keystore?

I can't seem to find any concrete information about how to setup the
keystore cert chain and/or the truststore. Does anybody know of any good
sources on this topic, or know at the top of the minds how this setup is
supposed to be?


On Mon, Aug 26, 2019 at 10:01 PM Subroto Barua <sb...@yahoo.com.invalid>
wrote:

> could be issue with keystore/trustore --- you may want to do keytool --
> list  -- validate the files/password; also do md5sum on files from 1 node
> in west and 1 node in east.
> check ssl port 7001 --- from 1 node in west --> telnet <node in east>:7001
> (or custom port if you are not using default port)
>
> On Monday, August 26, 2019, 05:46:19 PM PDT, Michael Carlise
> <mc...@salesforce.com.INVALID> wrote:
>
>
> Subroto -
>
> both tools error; openssl errno 111 - which made me check bound ports on
> the c* node with encryption flipped.  Port 9042 is not open (determined by
> netstat -ant).  Looking at the log differences for when a node is started
> with/without encryption.  Without encryption, I get a bunch of lines like:
>
> OutboundTcpConnection.java:561 - Handshaking version w/ IP
>
> And this happens after a line like
>
> Gossiper.java - Waiting for gossip to settle...
>
> with encryption toggled to 'dc', I don't see any of those lines;
> presumable b/c the gossiper is trying to start but doesn't.
>
> On Mon, Aug 26, 2019 at 6:51 PM Subroto Barua <sb...@yahoo.com.invalid>
> wrote:
>
> Michael,
>
> Are you able to connect to any c* node via OpenSSL?
>
> Openssl s_client -connect <ip address >:9042
>
> Cqlsh <ip address> —ssl
>
> Subroto
>
> On Aug 26, 2019, at 2:47 PM, Marc Selwan <ma...@datastax.com> wrote:
>
> which exact version of OpenJDK are you using? Is it possible you don't
> have JCE on those nodes? (I believe more recent versions of Java 8 has this
> baked in so that might not be it)
>
>
> *Marc Selwan | *DataStax *| *PM, Server Team *|* *(925) 413-7079* *|*
> Twitter <https://twitter.com/MarcSelwan>
>
> *  Quick links | *DataStax <http://www.datastax.com> *| *Training
> <http://www.academy.datastax.com> *| *Documentation
> <http://www.datastax.com/documentation/getting_started/doc/getting_started/gettingStartedIntro_r.html>
>  *| *Downloads <http://www.datastax.com/download>
>
>
>
> On Mon, Aug 26, 2019 at 1:56 PM Michael Carlise <
> mcarlise@salesforce.com.invalid> wrote:
>
>
> I originally opened this issue on stackoverflow (
> https://stackoverflow.com/questions/57516660/cassandra-node-to-node-encryption-throws-unable-to-gossip-with-peers-exception
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__stackoverflow.com_questions_57516660_cassandra-2Dnode-2Dto-2Dnode-2Dencryption-2Dthrows-2Dunable-2Dto-2Dgossip-2Dwith-2Dpeers-2Dexception&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=E6NVfMr2TIhW42QMfARTvsfCLtdF-oEA3KfAQRfVZdk&m=KdhQDpMbz8v1GYrbdYL_opGq-GBPXftrEYEkgcGeMp0&s=4CR8PRQopb4FyCLj8PDI44mSouBz65Yx8THnH8tOb7o&e=>
> ).
>
> However, I haven't gotten any responses in over a week.  I'm going to post
> it here and maybe someone will have an idea on where I can look.
>
> We currently run a multi region cassandra cluster in AWS. It runs in four
> regions, 12 nodes per region. It runs without node to node encryption (or
> client encryption either). We are trying to enable inter datacenter node to
> node encryption. However, when we flip encryption over we get an exception
> that nodes are unable to gossip with any peers.
>
> It could possibly be that we didn't build our jks keystore/truststores
> correctly (more on how we built these files below). But, we additionally do
> not see intra datacenter communication working (which should be set to
> unencrypted communication). Additionally, cqlsh cannot connect to the node
> either; even though we have (by default) client_auth_required set to false
> .
>
> ERROR [main] 2019-08-15 18:46:32,241 CassandraDaemon.java:749 - Exception encountered during startup
> java.lang.RuntimeException: Unable to gossip with any peers
>         at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1435) ~[apache-cassandra-3.11.4.jar:3.11.4]
>         at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:566) ~[apache-cassandra-3.11.4.jar:3.11.4]
>         at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:823) ~[apache-cassandra-3.11.4.jar:3.11.4]
>         at org.apache.cassandra.service.StorageService.initServer(StorageService.java:683) ~[apache-cassandra-3.11.4.jar:3.11.4]
>         at org.apache.cassandra.service.StorageService.initServer(StorageService.java:632) ~[apache-cassandra-3.11.4.jar:3.11.4]
>         at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:388) [apache-cassandra-3.11.4.jar:3.11.4]
>         at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:620) [apache-cassandra-3.11.4.jar:3.11.4]
>         at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:732) [apache-cassandra-3.11.4.jar:3.11.4]
> INFO  [main] 2019-08-15 18:47:07,384 YamlConfigurationLoader.java:89 - Configuration location: file:/etc/cassandra/cassandra.yaml
>
>
> Something to note is that this error message occurs after a few minutes of
> the node being up. (i.e. there is a delay between start up before this
> exception is thrown).
>
> *Information about our cassandra setup*
>
> cassandra version: 3.11.4
> JDK version: openjdk-8.
> Linux: Ubuntu 18.04 (bionic).
>
> *cassandra.yaml*
>
> endpoint_snitch: Ec2MultiRegionSnitch
>
> server_encryption_options:
>   internode_encryption: dc
>   keystore: <omitted>
>   keystore_password: <omitted>
>   truststore: <omitted>
>   truststore_password: <omitted>
>
> client_encryption_options:
>   enabled: false
>
> *cassandra-rackdc.properties*
>
> prefer_local=true
>
> *No obvious errors with SSH output*
>
> When starting cassandra with JVM_OPTS="$JVM_OPTS -Djavax.net.debug=ssl" added
> to cassandra-env.sh we see SSL logs printed to stdout (*Note: Subject and
> Issuer were omitted on purpose)*.
>
> found key for : cassy-us-west-2
> adding as trusted cert:
>   Subject: ...
>   Issuer:  ...
>   Algorithm: RSA; Serial number: 0xdad28d843fc73325d4c1a75207d4e74
>   Valid from Fri May 27 00:00:00 UTC 2016 until Tue May 26 23:59:59 UTC 2026
>
> ...
>
> trigger seeding of SecureRandom
> done seeding SecureRandom
>
> Looking at Java SE SSL/TLS connection debugging
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.oracle.com_javase_7_docs_technotes_guides_security_jsse_ReadDebug.html&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=E6NVfMr2TIhW42QMfARTvsfCLtdF-oEA3KfAQRfVZdk&m=KdhQDpMbz8v1GYrbdYL_opGq-GBPXftrEYEkgcGeMp0&s=SR3ashwvSRxA75nBjGDwjAwq65nDuBZUaDOvHPGDrps&e=>,
> this looks correct. But to note, we see this series of messages (along with
> the RSA key signature output) repeated several times in rapid fire. We
> never observe any messages about the trust store being added; however that
> might be something that occurs only on client initiation (?)
>
> Additionally, we do see cassandra report that the Encrypted Messaging
> service has been started.
>
> INFO  [main] 2019-08-15 18:45:31,022 MessagingService.java:704 - Starting Encrypted Messaging Service on SSL port 7001
>
> *Doesn't appear to be a cassandra.yaml configuration problem*
>
> We can bring the node back online by simply configuring internode_encryption:
> none. This action seems to rule out a broadcast_address or rpc_address
> configuration problem.
>
> *How we built our keystore/truststores*
>
> We followed the basic template datastax docs for preparing SSL
> certificates
> <https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/configuration/secureSSLCertWithCA.html>.
> One minor difference was that our private key and CSRs were generated using
> openssl. One per each region (we plan to share key/signed certs across
> nodes in regions). This was created using a command template as:
>
> openssl req -new -newkey rsa:2048 -out cassy-<region>.csr -keyout cassy-<region>.key -config cassy-<region>.conf -subj "..." -nodes -sha256
>
> The generated CSR was then signed by an internal root CA. Because we
> generated our files using openssl, we had to build our jks files by
> importing our certs into them.
>
> *Commands to generate truststore*
>
> We distribute this one file to all nodes.
>
> keytool -importcert
>     -keystore generic-server-truststore.jks
>     -alias rootCa
>     -file rootCa.crt
>     -noprompt
>     -keypass omitted
>     -storepass omitted
>
> *Commands to generate keystore*
>
> This was done one per region; but essentially we created a keystore with
> keytool, then deleted the key entry and then imported our key entry using
> keytool from a pkcs12 file.
>
> keytool -genkeypair -keyalg RSA -alias cassy-${region} -keystore cassy-${region}.jks -storepass omitted -keypass omitted -validity 365 -keysize 2048 -dname "..."
>
> keytool -delete -alias cassy-${region} -keystore cassy-${region}.jks -storepass omitted
>
> openssl pkcs12 -export -in signed_certs/${region}.pem -inkey keys/cassandra.${region}.key -name cassy-${region} -out ${region}.p12
>
> keytool -importkeystore -deststorepass omitted -destkeystore cassy-${region}.jks -srckeystore ${region}.p12 -srcstoretype PKCS12
>
> keytool -importcert -keystore cassy-${region}.jks -alias rootCa -file ca.crt -noprompt -keypass omitted -storepass omitted
>
> Looking back at this, I don't remember why we used keytool to generate a
> keypair/keystore, then deleted and imported. I think it was because the
> keytool importkeystore command refused to run if the keystore didn't
> already exist.
>
> *ca.crt and pem file*
>
> The ca.crt file contains the root certificate and the intermediate
> certificate that was used to sign the CSR. The pem file contains the signed
> CSR returned to us, the intermediate cert, and the root CA (in that order).
>
> *openssl verify ca.crt and pem*
>
> openssl verify -CAfile ca.crt us-west-2.pem
> signed_certs/us-west-2.pem: OK
>
> *Command output after enabling encryption*
>
> *nodetool status (output truncated)*
>
> Datacenter: us-east
> ===================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address         Load       Tokens       Owns (effective)  Host ID                               Rack
> ?N  52.44.11.221    ?          256          25.4%             null                                  1c
> ...
> ?N  52.204.232.195  ?          256          23.2%             null                                  1d
> Datacenter: us-west-2
> =====================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address         Load       Tokens       Owns (effective)  Host ID                               Rack
> ?N  34.209.2.144    ?          256          26.5%             null                                  2c
> UN  52.40.32.177    105.99 GiB  256          23.7%             null                                  2c
> ?N  34.210.109.203  ?          256          24.7%             null                                  2a
> ...
>
> With the online node being the node with encryption set.
>
> *cqlsh to localhost*
>
> cassy-node6:~$ cqlsh
> Connection error: ('Unable to connect to any servers', {'127.0.0.1': error(111, "Tried connecting to [('127.0.0.1', 9042)]. Last error: Connection refused")})
>
> *cqlsh to remote node* Remote node is a node with encryption enabled
>
> cassy-node6:~$ cqlsh 10.0.2.7
> Connection error: ('Unable to connect to any servers', {'10.0.2.7': error(111, "Tried connecting to [('10.0.2.7', 9042)]. Last error: Connection refused")})
>
>

Re: unable to gossip with peers exception when internode encryption is set to any setting other than 'none'

Posted by Subroto Barua <sb...@yahoo.com.INVALID>.
 could be issue with keystore/trustore --- you may want to do keytool -- list  -- validate the files/password; also do md5sum on files from 1 node in west and 1 node in east.check ssl port 7001 --- from 1 node in west --> telnet <node in east>:7001 (or custom port if you are not using default port)
    On Monday, August 26, 2019, 05:46:19 PM PDT, Michael Carlise <mc...@salesforce.com.INVALID> wrote:  
 
 Subroto -
both tools error; openssl errno 111 - which made me check bound ports on the c* node with encryption flipped.  Port 9042 is not open (determined by netstat -ant).  Looking at the log differences for when a node is started with/without encryption.  Without encryption, I get a bunch of lines like:
OutboundTcpConnection.java:561 - Handshaking version w/ IP
And this happens after a line like
Gossiper.java - Waiting for gossip to settle...
with encryption toggled to 'dc', I don't see any of those lines; presumable b/c the gossiper is trying to start but doesn't.
On Mon, Aug 26, 2019 at 6:51 PM Subroto Barua <sb...@yahoo.com.invalid> wrote:

Michael,
Are you able to connect to any c* node via OpenSSL?
Openssl s_client -connect <ip address >:9042
Cqlsh <ip address> —ssl 
Subroto 
On Aug 26, 2019, at 2:47 PM, Marc Selwan <ma...@datastax.com> wrote:


which exact version of OpenJDK are you using? Is it possible you don't have JCE on those nodes? (I believe more recent versions of Java 8 has this baked in so that might not be it)

Marc Selwan | DataStax | PM, Server Team | (925) 413-7079 | Twitter 
  Quick links | DataStax | Training | Documentation | Downloads  



On Mon, Aug 26, 2019 at 1:56 PM Michael Carlise <mc...@salesforce.com.invalid> wrote:


I originally opened this issue on stackoverflow (https://stackoverflow.com/questions/57516660/cassandra-node-to-node-encryption-throws-unable-to-gossip-with-peers-exception).  
However, I haven't gotten any responses in over a week.  I'm going to post it here and maybe someone will have an idea on where I can look.

We currently run a multi region cassandra cluster in AWS. It runs in four regions, 12 nodes per region. It runs without node to node encryption (or client encryption either). We are trying to enable inter datacenter node to node encryption. However, when we flip encryption over we get an exception that nodes are unable to gossip with any peers.

It could possibly be that we didn't build our jks keystore/truststores correctly (more on how we built these files below). But, we additionally do not see intra datacenter communication working (which should be set to unencrypted communication). Additionally, cqlsh cannot connect to the node either; even though we have (by default) client_auth_required set to false.
ERROR [main] 2019-08-15 18:46:32,241 CassandraDaemon.java:749 - Exception encountered during startup
java.lang.RuntimeException: Unable to gossip with any peers
        at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1435) ~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:566) ~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:823) ~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.service.StorageService.initServer(StorageService.java:683) ~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.service.StorageService.initServer(StorageService.java:632) ~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:388) [apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:620) [apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:732) [apache-cassandra-3.11.4.jar:3.11.4]
INFO  [main] 2019-08-15 18:47:07,384 YamlConfigurationLoader.java:89 - Configuration location: file:/etc/cassandra/cassandra.yaml


Something to note is that this error message occurs after a few minutes of the node being up. (i.e. there is a delay between start up before this exception is thrown).

Information about our cassandra setup

cassandra version: 3.11.4
JDK version: openjdk-8.
Linux: Ubuntu 18.04 (bionic).

cassandra.yaml
endpoint_snitch: Ec2MultiRegionSnitch

server_encryption_options:
  internode_encryption: dc
  keystore: <omitted>
  keystore_password: <omitted>
  truststore: <omitted>
  truststore_password: <omitted>

client_encryption_options:
  enabled: false

cassandra-rackdc.properties
prefer_local=true

No obvious errors with SSH output

When starting cassandra with JVM_OPTS="$JVM_OPTS -Djavax.net.debug=ssl" added to cassandra-env.sh we see SSL logs printed to stdout (Note: Subject and Issuer were omitted on purpose).
found key for : cassy-us-west-2                                                                                                                                                                                                       
adding as trusted cert:                                                                                                                                                                                                               
  Subject: ...                                                                                                                                                      
  Issuer:  ...                                                                                                                                                      
  Algorithm: RSA; Serial number: 0xdad28d843fc73325d4c1a75207d4e74                                                                                                                                                                    
  Valid from Fri May 27 00:00:00 UTC 2016 until Tue May 26 23:59:59 UTC 2026  

...

trigger seeding of SecureRandom
done seeding SecureRandom   

Looking at Java SE SSL/TLS connection debugging, this looks correct. But to note, we see this series of messages (along with the RSA key signature output) repeated several times in rapid fire. We never observe any messages about the trust store being added; however that might be something that occurs only on client initiation (?)

Additionally, we do see cassandra report that the Encrypted Messaging service has been started.
INFO  [main] 2019-08-15 18:45:31,022 MessagingService.java:704 - Starting Encrypted Messaging Service on SSL port 7001

Doesn't appear to be a cassandra.yaml configuration problem

We can bring the node back online by simply configuring internode_encryption: none. This action seems to rule out a broadcast_address or rpc_address configuration problem.

How we built our keystore/truststores

We followed the basic template datastax docs for preparing SSL certificates. One minor difference was that our private key and CSRs were generated using openssl. One per each region (we plan to share key/signed certs across nodes in regions). This was created using a command template as:
openssl req -new -newkey rsa:2048 -out cassy-<region>.csr -keyout cassy-<region>.key -config cassy-<region>.conf -subj "..." -nodes -sha256

The generated CSR was then signed by an internal root CA. Because we generated our files using openssl, we had to build our jks files by importing our certs into them.

Commands to generate truststore

We distribute this one file to all nodes.
keytool -importcert 
    -keystore generic-server-truststore.jks 
    -alias rootCa  
    -file rootCa.crt 
    -noprompt
    -keypass omitted 
    -storepass omitted 

Commands to generate keystore

This was done one per region; but essentially we created a keystore with keytool, then deleted the key entry and then imported our key entry using keytool from a pkcs12 file.
keytool -genkeypair -keyalg RSA -alias cassy-${region} -keystore cassy-${region}.jks -storepass omitted -keypass omitted -validity 365 -keysize 2048 -dname "..." 

keytool -delete -alias cassy-${region} -keystore cassy-${region}.jks -storepass omitted

openssl pkcs12 -export -in signed_certs/${region}.pem -inkey keys/cassandra.${region}.key -name cassy-${region} -out ${region}.p12 

keytool -importkeystore -deststorepass omitted -destkeystore cassy-${region}.jks -srckeystore ${region}.p12 -srcstoretype PKCS12 

keytool -importcert -keystore cassy-${region}.jks -alias rootCa -file ca.crt -noprompt -keypass omitted -storepass omitted 

Looking back at this, I don't remember why we used keytool to generate a keypair/keystore, then deleted and imported. I think it was because the keytool importkeystore command refused to run if the keystore didn't already exist.

ca.crt and pem file

The ca.crt file contains the root certificate and the intermediate certificate that was used to sign the CSR. The pem file contains the signed CSR returned to us, the intermediate cert, and the root CA (in that order).

openssl verify ca.crt and pem
openssl verify -CAfile ca.crt us-west-2.pem
signed_certs/us-west-2.pem: OK

Command output after enabling encryption

nodetool status (output truncated)
Datacenter: us-east                                                                                                
===================                                      
Status=Up/Down                                           
|/ State=Normal/Leaving/Joining/Moving                   
--  Address         Load       Tokens       Owns (effective)  Host ID                               Rack
?N  52.44.11.221    ?          256          25.4%             null                                  1c             
...
?N  52.204.232.195  ?          256          23.2%             null                                  1d             
Datacenter: us-west-2                                                                                              
=====================
Status=Up/Down                                           
|/ State=Normal/Leaving/Joining/Moving                   
--  Address         Load       Tokens       Owns (effective)  Host ID                               Rack           
?N  34.209.2.144    ?          256          26.5%             null                                  2c             
UN  52.40.32.177    105.99 GiB  256          23.7%             null                                  2c            
?N  34.210.109.203  ?          256          24.7%             null                                  2a   
...                  

With the online node being the node with encryption set.

cqlsh to localhost
cassy-node6:~$ cqlsh
Connection error: ('Unable to connect to any servers', {'127.0.0.1': error(111, "Tried connecting to [('127.0.0.1', 9042)]. Last error: Connection refused")})

cqlsh to remote node Remote node is a node with encryption enabled
cassy-node6:~$ cqlsh 10.0.2.7
Connection error: ('Unable to connect to any servers', {'10.0.2.7': error(111, "Tried connecting to [('10.0.2.7', 9042)]. Last error: Connection refused")})


  

Re: unable to gossip with peers exception when internode encryption is set to any setting other than 'none'

Posted by Michael Carlise <mc...@salesforce.com.INVALID>.
Subroto -

both tools error; openssl errno 111 - which made me check bound ports on
the c* node with encryption flipped.  Port 9042 is not open (determined by
netstat -ant).  Looking at the log differences for when a node is started
with/without encryption.  Without encryption, I get a bunch of lines like:

OutboundTcpConnection.java:561 - Handshaking version w/ IP

And this happens after a line like

Gossiper.java - Waiting for gossip to settle...

with encryption toggled to 'dc', I don't see any of those lines; presumable
b/c the gossiper is trying to start but doesn't.

On Mon, Aug 26, 2019 at 6:51 PM Subroto Barua <sb...@yahoo.com.invalid>
wrote:

> Michael,
>
> Are you able to connect to any c* node via OpenSSL?
>
> Openssl s_client -connect <ip address >:9042
>
> Cqlsh <ip address> —ssl
>
> Subroto
>
> On Aug 26, 2019, at 2:47 PM, Marc Selwan <ma...@datastax.com> wrote:
>
> which exact version of OpenJDK are you using? Is it possible you don't
> have JCE on those nodes? (I believe more recent versions of Java 8 has this
> baked in so that might not be it)
>
>
> *Marc Selwan | *DataStax *| *PM, Server Team *|* *(925) 413-7079* *|*
> Twitter <https://twitter.com/MarcSelwan>
>
> *  Quick links | *DataStax <http://www.datastax.com> *| *Training
> <http://www.academy.datastax.com> *| *Documentation
> <http://www.datastax.com/documentation/getting_started/doc/getting_started/gettingStartedIntro_r.html>
>  *| *Downloads <http://www.datastax.com/download>
>
>
>
> On Mon, Aug 26, 2019 at 1:56 PM Michael Carlise <
> mcarlise@salesforce.com.invalid> wrote:
>
>>
>> I originally opened this issue on stackoverflow (
>> https://stackoverflow.com/questions/57516660/cassandra-node-to-node-encryption-throws-unable-to-gossip-with-peers-exception
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__stackoverflow.com_questions_57516660_cassandra-2Dnode-2Dto-2Dnode-2Dencryption-2Dthrows-2Dunable-2Dto-2Dgossip-2Dwith-2Dpeers-2Dexception&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=E6NVfMr2TIhW42QMfARTvsfCLtdF-oEA3KfAQRfVZdk&m=KdhQDpMbz8v1GYrbdYL_opGq-GBPXftrEYEkgcGeMp0&s=4CR8PRQopb4FyCLj8PDI44mSouBz65Yx8THnH8tOb7o&e=>
>> ).
>>
>> However, I haven't gotten any responses in over a week.  I'm going to
>> post it here and maybe someone will have an idea on where I can look.
>>
>> We currently run a multi region cassandra cluster in AWS. It runs in four
>> regions, 12 nodes per region. It runs without node to node encryption (or
>> client encryption either). We are trying to enable inter datacenter node to
>> node encryption. However, when we flip encryption over we get an exception
>> that nodes are unable to gossip with any peers.
>>
>> It could possibly be that we didn't build our jks keystore/truststores
>> correctly (more on how we built these files below). But, we additionally do
>> not see intra datacenter communication working (which should be set to
>> unencrypted communication). Additionally, cqlsh cannot connect to the node
>> either; even though we have (by default) client_auth_required set to
>> false.
>>
>> ERROR [main] 2019-08-15 18:46:32,241 CassandraDaemon.java:749 - Exception encountered during startup
>> java.lang.RuntimeException: Unable to gossip with any peers
>>         at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1435) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:566) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:823) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.service.StorageService.initServer(StorageService.java:683) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.service.StorageService.initServer(StorageService.java:632) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:388) [apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:620) [apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:732) [apache-cassandra-3.11.4.jar:3.11.4]
>> INFO  [main] 2019-08-15 18:47:07,384 YamlConfigurationLoader.java:89 - Configuration location: file:/etc/cassandra/cassandra.yaml
>>
>>
>> Something to note is that this error message occurs after a few minutes
>> of the node being up. (i.e. there is a delay between start up before this
>> exception is thrown).
>>
>> *Information about our cassandra setup*
>>
>> cassandra version: 3.11.4
>> JDK version: openjdk-8.
>> Linux: Ubuntu 18.04 (bionic).
>>
>> *cassandra.yaml*
>>
>> endpoint_snitch: Ec2MultiRegionSnitch
>>
>> server_encryption_options:
>>   internode_encryption: dc
>>   keystore: <omitted>
>>   keystore_password: <omitted>
>>   truststore: <omitted>
>>   truststore_password: <omitted>
>>
>> client_encryption_options:
>>   enabled: false
>>
>> *cassandra-rackdc.properties*
>>
>> prefer_local=true
>>
>> *No obvious errors with SSH output*
>>
>> When starting cassandra with JVM_OPTS="$JVM_OPTS -Djavax.net.debug=ssl" added
>> to cassandra-env.sh we see SSL logs printed to stdout (*Note: Subject
>> and Issuer were omitted on purpose)*.
>>
>> found key for : cassy-us-west-2
>> adding as trusted cert:
>>   Subject: ...
>>   Issuer:  ...
>>   Algorithm: RSA; Serial number: 0xdad28d843fc73325d4c1a75207d4e74
>>   Valid from Fri May 27 00:00:00 UTC 2016 until Tue May 26 23:59:59 UTC 2026
>>
>> ...
>>
>> trigger seeding of SecureRandom
>> done seeding SecureRandom
>>
>> Looking at Java SE SSL/TLS connection debugging
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.oracle.com_javase_7_docs_technotes_guides_security_jsse_ReadDebug.html&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=E6NVfMr2TIhW42QMfARTvsfCLtdF-oEA3KfAQRfVZdk&m=KdhQDpMbz8v1GYrbdYL_opGq-GBPXftrEYEkgcGeMp0&s=SR3ashwvSRxA75nBjGDwjAwq65nDuBZUaDOvHPGDrps&e=>,
>> this looks correct. But to note, we see this series of messages (along with
>> the RSA key signature output) repeated several times in rapid fire. We
>> never observe any messages about the trust store being added; however that
>> might be something that occurs only on client initiation (?)
>>
>> Additionally, we do see cassandra report that the Encrypted Messaging
>> service has been started.
>>
>> INFO  [main] 2019-08-15 18:45:31,022 MessagingService.java:704 - Starting Encrypted Messaging Service on SSL port 7001
>>
>> *Doesn't appear to be a cassandra.yaml configuration problem*
>>
>> We can bring the node back online by simply configuring internode_encryption:
>> none. This action seems to rule out a broadcast_address or rpc_address
>> configuration problem.
>>
>> *How we built our keystore/truststores*
>>
>> We followed the basic template datastax docs for preparing SSL
>> certificates
>> <https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/configuration/secureSSLCertWithCA.html>.
>> One minor difference was that our private key and CSRs were generated using
>> openssl. One per each region (we plan to share key/signed certs across
>> nodes in regions). This was created using a command template as:
>>
>> openssl req -new -newkey rsa:2048 -out cassy-<region>.csr -keyout cassy-<region>.key -config cassy-<region>.conf -subj "..." -nodes -sha256
>>
>> The generated CSR was then signed by an internal root CA. Because we
>> generated our files using openssl, we had to build our jks files by
>> importing our certs into them.
>>
>> *Commands to generate truststore*
>>
>> We distribute this one file to all nodes.
>>
>> keytool -importcert
>>     -keystore generic-server-truststore.jks
>>     -alias rootCa
>>     -file rootCa.crt
>>     -noprompt
>>     -keypass omitted
>>     -storepass omitted
>>
>> *Commands to generate keystore*
>>
>> This was done one per region; but essentially we created a keystore with
>> keytool, then deleted the key entry and then imported our key entry using
>> keytool from a pkcs12 file.
>>
>> keytool -genkeypair -keyalg RSA -alias cassy-${region} -keystore cassy-${region}.jks -storepass omitted -keypass omitted -validity 365 -keysize 2048 -dname "..."
>>
>> keytool -delete -alias cassy-${region} -keystore cassy-${region}.jks -storepass omitted
>>
>> openssl pkcs12 -export -in signed_certs/${region}.pem -inkey keys/cassandra.${region}.key -name cassy-${region} -out ${region}.p12
>>
>> keytool -importkeystore -deststorepass omitted -destkeystore cassy-${region}.jks -srckeystore ${region}.p12 -srcstoretype PKCS12
>>
>> keytool -importcert -keystore cassy-${region}.jks -alias rootCa -file ca.crt -noprompt -keypass omitted -storepass omitted
>>
>> Looking back at this, I don't remember why we used keytool to generate a
>> keypair/keystore, then deleted and imported. I think it was because the
>> keytool importkeystore command refused to run if the keystore didn't
>> already exist.
>>
>> *ca.crt and pem file*
>>
>> The ca.crt file contains the root certificate and the intermediate
>> certificate that was used to sign the CSR. The pem file contains the signed
>> CSR returned to us, the intermediate cert, and the root CA (in that order).
>>
>> *openssl verify ca.crt and pem*
>>
>> openssl verify -CAfile ca.crt us-west-2.pem
>> signed_certs/us-west-2.pem: OK
>>
>> *Command output after enabling encryption*
>>
>> *nodetool status (output truncated)*
>>
>> Datacenter: us-east
>> ===================
>> Status=Up/Down
>> |/ State=Normal/Leaving/Joining/Moving
>> --  Address         Load       Tokens       Owns (effective)  Host ID                               Rack
>> ?N  52.44.11.221    ?          256          25.4%             null                                  1c
>> ...
>> ?N  52.204.232.195  ?          256          23.2%             null                                  1d
>> Datacenter: us-west-2
>> =====================
>> Status=Up/Down
>> |/ State=Normal/Leaving/Joining/Moving
>> --  Address         Load       Tokens       Owns (effective)  Host ID                               Rack
>> ?N  34.209.2.144    ?          256          26.5%             null                                  2c
>> UN  52.40.32.177    105.99 GiB  256          23.7%             null                                  2c
>> ?N  34.210.109.203  ?          256          24.7%             null                                  2a
>> ...
>>
>> With the online node being the node with encryption set.
>>
>> *cqlsh to localhost*
>>
>> cassy-node6:~$ cqlsh
>> Connection error: ('Unable to connect to any servers', {'127.0.0.1': error(111, "Tried connecting to [('127.0.0.1', 9042)]. Last error: Connection refused")})
>>
>> *cqlsh to remote node* Remote node is a node with encryption enabled
>>
>> cassy-node6:~$ cqlsh 10.0.2.7
>> Connection error: ('Unable to connect to any servers', {'10.0.2.7': error(111, "Tried connecting to [('10.0.2.7', 9042)]. Last error: Connection refused")})
>>
>>

Re: unable to gossip with peers exception when internode encryption is set to any setting other than 'none'

Posted by Subroto Barua <sb...@yahoo.com.INVALID>.
Michael,

Are you able to connect to any c* node via OpenSSL?

Openssl s_client -connect <ip address >:9042

Cqlsh <ip address> —ssl 

Subroto 

> On Aug 26, 2019, at 2:47 PM, Marc Selwan <ma...@datastax.com> wrote:
> 
> which exact version of OpenJDK are you using? Is it possible you don't have JCE on those nodes? (I believe more recent versions of Java 8 has this baked in so that might not be it)
> 
> 
> Marc Selwan | DataStax | PM, Server Team | (925) 413-7079 | Twitter 
> 
>   Quick links | DataStax | Training | Documentation | Downloads  
> 
> 
> 
>> On Mon, Aug 26, 2019 at 1:56 PM Michael Carlise <mc...@salesforce.com.invalid> wrote:
>> 
>> I originally opened this issue on stackoverflow (https://stackoverflow.com/questions/57516660/cassandra-node-to-node-encryption-throws-unable-to-gossip-with-peers-exception).  
>> 
>> However, I haven't gotten any responses in over a week.  I'm going to post it here and maybe someone will have an idea on where I can look.
>> 
>> We currently run a multi region cassandra cluster in AWS. It runs in four regions, 12 nodes per region. It runs without node to node encryption (or client encryption either). We are trying to enable inter datacenter node to node encryption. However, when we flip encryption over we get an exception that nodes are unable to gossip with any peers.
>> 
>> It could possibly be that we didn't build our jks keystore/truststores correctly (more on how we built these files below). But, we additionally do not see intra datacenter communication working (which should be set to unencrypted communication). Additionally, cqlsh cannot connect to the node either; even though we have (by default) client_auth_required set to false.
>> 
>> ERROR [main] 2019-08-15 18:46:32,241 CassandraDaemon.java:749 - Exception encountered during startup
>> java.lang.RuntimeException: Unable to gossip with any peers
>>         at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1435) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:566) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:823) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.service.StorageService.initServer(StorageService.java:683) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.service.StorageService.initServer(StorageService.java:632) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:388) [apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:620) [apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:732) [apache-cassandra-3.11.4.jar:3.11.4]
>> INFO  [main] 2019-08-15 18:47:07,384 YamlConfigurationLoader.java:89 - Configuration location: file:/etc/cassandra/cassandra.yaml
>> 
>> Something to note is that this error message occurs after a few minutes of the node being up. (i.e. there is a delay between start up before this exception is thrown).
>> 
>> Information about our cassandra setup
>> 
>> cassandra version: 3.11.4
>> JDK version: openjdk-8.
>> Linux: Ubuntu 18.04 (bionic).
>> 
>> cassandra.yaml
>> 
>> endpoint_snitch: Ec2MultiRegionSnitch
>> 
>> server_encryption_options:
>>   internode_encryption: dc
>>   keystore: <omitted>
>>   keystore_password: <omitted>
>>   truststore: <omitted>
>>   truststore_password: <omitted>
>> 
>> client_encryption_options:
>>   enabled: false
>> cassandra-rackdc.properties
>> 
>> prefer_local=true
>> No obvious errors with SSH output
>> 
>> When starting cassandra with JVM_OPTS="$JVM_OPTS -Djavax.net.debug=ssl" added to cassandra-env.sh we see SSL logs printed to stdout (Note: Subject and Issuer were omitted on purpose).
>> 
>> found key for : cassy-us-west-2                                                                                                                                                                                                       
>> adding as trusted cert:                                                                                                                                                                                                               
>>   Subject: ...                                                                                                                                                      
>>   Issuer:  ...                                                                                                                                                      
>>   Algorithm: RSA; Serial number: 0xdad28d843fc73325d4c1a75207d4e74                                                                                                                                                                    
>>   Valid from Fri May 27 00:00:00 UTC 2016 until Tue May 26 23:59:59 UTC 2026  
>> 
>> ...
>> 
>> trigger seeding of SecureRandom
>> done seeding SecureRandom   
>> Looking at Java SE SSL/TLS connection debugging, this looks correct. But to note, we see this series of messages (along with the RSA key signature output) repeated several times in rapid fire. We never observe any messages about the trust store being added; however that might be something that occurs only on client initiation (?)
>> 
>> Additionally, we do see cassandra report that the Encrypted Messaging service has been started.
>> 
>> INFO  [main] 2019-08-15 18:45:31,022 MessagingService.java:704 - Starting Encrypted Messaging Service on SSL port 7001
>> Doesn't appear to be a cassandra.yaml configuration problem
>> 
>> We can bring the node back online by simply configuring internode_encryption: none. This action seems to rule out a broadcast_address or rpc_address configuration problem.
>> 
>> How we built our keystore/truststores
>> 
>> We followed the basic template datastax docs for preparing SSL certificates. One minor difference was that our private key and CSRs were generated using openssl. One per each region (we plan to share key/signed certs across nodes in regions). This was created using a command template as:
>> 
>> openssl req -new -newkey rsa:2048 -out cassy-<region>.csr -keyout cassy-<region>.key -config cassy-<region>.conf -subj "..." -nodes -sha256
>> The generated CSR was then signed by an internal root CA. Because we generated our files using openssl, we had to build our jks files by importing our certs into them.
>> 
>> Commands to generate truststore
>> 
>> We distribute this one file to all nodes.
>> 
>> keytool -importcert 
>>     -keystore generic-server-truststore.jks 
>>     -alias rootCa  
>>     -file rootCa.crt 
>>     -noprompt
>>     -keypass omitted 
>>     -storepass omitted 
>> Commands to generate keystore
>> 
>> This was done one per region; but essentially we created a keystore with keytool, then deleted the key entry and then imported our key entry using keytool from a pkcs12 file.
>> 
>> keytool -genkeypair -keyalg RSA -alias cassy-${region} -keystore cassy-${region}.jks -storepass omitted -keypass omitted -validity 365 -keysize 2048 -dname "..." 
>> 
>> keytool -delete -alias cassy-${region} -keystore cassy-${region}.jks -storepass omitted
>> 
>> openssl pkcs12 -export -in signed_certs/${region}.pem -inkey keys/cassandra.${region}.key -name cassy-${region} -out ${region}.p12 
>> 
>> keytool -importkeystore -deststorepass omitted -destkeystore cassy-${region}.jks -srckeystore ${region}.p12 -srcstoretype PKCS12 
>> 
>> keytool -importcert -keystore cassy-${region}.jks -alias rootCa -file ca.crt -noprompt -keypass omitted -storepass omitted 
>> Looking back at this, I don't remember why we used keytool to generate a keypair/keystore, then deleted and imported. I think it was because the keytool importkeystore command refused to run if the keystore didn't already exist.
>> 
>> ca.crt and pem file
>> 
>> The ca.crt file contains the root certificate and the intermediate certificate that was used to sign the CSR. The pem file contains the signed CSR returned to us, the intermediate cert, and the root CA (in that order).
>> 
>> openssl verify ca.crt and pem
>> 
>> openssl verify -CAfile ca.crt us-west-2.pem
>> signed_certs/us-west-2.pem: OK
>> Command output after enabling encryption
>> 
>> nodetool status (output truncated)
>> 
>> Datacenter: us-east                                                                                                
>> ===================                                      
>> Status=Up/Down                                           
>> |/ State=Normal/Leaving/Joining/Moving                   
>> --  Address         Load       Tokens       Owns (effective)  Host ID                               Rack
>> ?N  52.44.11.221    ?          256          25.4%             null                                  1c             
>> ...
>> ?N  52.204.232.195  ?          256          23.2%             null                                  1d             
>> Datacenter: us-west-2                                                                                              
>> =====================
>> Status=Up/Down                                           
>> |/ State=Normal/Leaving/Joining/Moving                   
>> --  Address         Load       Tokens       Owns (effective)  Host ID                               Rack           
>> ?N  34.209.2.144    ?          256          26.5%             null                                  2c             
>> UN  52.40.32.177    105.99 GiB  256          23.7%             null                                  2c            
>> ?N  34.210.109.203  ?          256          24.7%             null                                  2a   
>> ...                  
>> With the online node being the node with encryption set.
>> 
>> cqlsh to localhost
>> 
>> cassy-node6:~$ cqlsh
>> Connection error: ('Unable to connect to any servers', {'127.0.0.1': error(111, "Tried connecting to [('127.0.0.1', 9042)]. Last error: Connection refused")})
>> cqlsh to remote node Remote node is a node with encryption enabled
>> 
>> cassy-node6:~$ cqlsh 10.0.2.7
>> Connection error: ('Unable to connect to any servers', {'10.0.2.7': error(111, "Tried connecting to [('10.0.2.7', 9042)]. Last error: Connection refused")})

Re: unable to gossip with peers exception when internode encryption is set to any setting other than 'none'

Posted by Michael Carlise <mc...@salesforce.com.INVALID>.
The version given by apt is 8u162-b12-1.  Which I think corresponds to
openJDK-8-162.  When I run jrunscript -e 'print
(javax.crypto.Cipher.getMaxAllowedKeyLength("RC5") >= 256);' the command
returns true.  Not sure if that is the best way to verify JCE installed.


Michael Carlise

On Mon, Aug 26, 2019 at 5:47 PM Marc Selwan <ma...@datastax.com>
wrote:

> which exact version of OpenJDK are you using? Is it possible you don't
> have JCE on those nodes? (I believe more recent versions of Java 8 has this
> baked in so that might not be it)
>
>
> *Marc Selwan | *DataStax *| *PM, Server Team *|* *(925) 413-7079* *|*
> Twitter <https://twitter.com/MarcSelwan>
>
> *  Quick links | *DataStax <http://www.datastax.com> *| *Training
> <http://www.academy.datastax.com> *| *Documentation
> <http://www.datastax.com/documentation/getting_started/doc/getting_started/gettingStartedIntro_r.html>
>  *| *Downloads <http://www.datastax.com/download>
>
>
>
> On Mon, Aug 26, 2019 at 1:56 PM Michael Carlise
> <mc...@salesforce.com.invalid> wrote:
>
>>
>> I originally opened this issue on stackoverflow (
>> https://stackoverflow.com/questions/57516660/cassandra-node-to-node-encryption-throws-unable-to-gossip-with-peers-exception
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__stackoverflow.com_questions_57516660_cassandra-2Dnode-2Dto-2Dnode-2Dencryption-2Dthrows-2Dunable-2Dto-2Dgossip-2Dwith-2Dpeers-2Dexception&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=E6NVfMr2TIhW42QMfARTvsfCLtdF-oEA3KfAQRfVZdk&m=KdhQDpMbz8v1GYrbdYL_opGq-GBPXftrEYEkgcGeMp0&s=4CR8PRQopb4FyCLj8PDI44mSouBz65Yx8THnH8tOb7o&e=>
>> ).
>>
>> However, I haven't gotten any responses in over a week.  I'm going to
>> post it here and maybe someone will have an idea on where I can look.
>>
>> We currently run a multi region cassandra cluster in AWS. It runs in four
>> regions, 12 nodes per region. It runs without node to node encryption (or
>> client encryption either). We are trying to enable inter datacenter node to
>> node encryption. However, when we flip encryption over we get an exception
>> that nodes are unable to gossip with any peers.
>>
>> It could possibly be that we didn't build our jks keystore/truststores
>> correctly (more on how we built these files below). But, we additionally do
>> not see intra datacenter communication working (which should be set to
>> unencrypted communication). Additionally, cqlsh cannot connect to the node
>> either; even though we have (by default) client_auth_required set to
>> false.
>>
>> ERROR [main] 2019-08-15 18:46:32,241 CassandraDaemon.java:749 - Exception encountered during startup
>> java.lang.RuntimeException: Unable to gossip with any peers
>>         at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1435) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:566) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:823) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.service.StorageService.initServer(StorageService.java:683) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.service.StorageService.initServer(StorageService.java:632) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:388) [apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:620) [apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:732) [apache-cassandra-3.11.4.jar:3.11.4]
>> INFO  [main] 2019-08-15 18:47:07,384 YamlConfigurationLoader.java:89 - Configuration location: file:/etc/cassandra/cassandra.yaml
>>
>>
>> Something to note is that this error message occurs after a few minutes
>> of the node being up. (i.e. there is a delay between start up before this
>> exception is thrown).
>>
>> *Information about our cassandra setup*
>>
>> cassandra version: 3.11.4
>> JDK version: openjdk-8.
>> Linux: Ubuntu 18.04 (bionic).
>>
>> *cassandra.yaml*
>>
>> endpoint_snitch: Ec2MultiRegionSnitch
>>
>> server_encryption_options:
>>   internode_encryption: dc
>>   keystore: <omitted>
>>   keystore_password: <omitted>
>>   truststore: <omitted>
>>   truststore_password: <omitted>
>>
>> client_encryption_options:
>>   enabled: false
>>
>> *cassandra-rackdc.properties*
>>
>> prefer_local=true
>>
>> *No obvious errors with SSH output*
>>
>> When starting cassandra with JVM_OPTS="$JVM_OPTS -Djavax.net.debug=ssl" added
>> to cassandra-env.sh we see SSL logs printed to stdout (*Note: Subject
>> and Issuer were omitted on purpose)*.
>>
>> found key for : cassy-us-west-2
>> adding as trusted cert:
>>   Subject: ...
>>   Issuer:  ...
>>   Algorithm: RSA; Serial number: 0xdad28d843fc73325d4c1a75207d4e74
>>   Valid from Fri May 27 00:00:00 UTC 2016 until Tue May 26 23:59:59 UTC 2026
>>
>> ...
>>
>> trigger seeding of SecureRandom
>> done seeding SecureRandom
>>
>> Looking at Java SE SSL/TLS connection debugging
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.oracle.com_javase_7_docs_technotes_guides_security_jsse_ReadDebug.html&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=E6NVfMr2TIhW42QMfARTvsfCLtdF-oEA3KfAQRfVZdk&m=KdhQDpMbz8v1GYrbdYL_opGq-GBPXftrEYEkgcGeMp0&s=SR3ashwvSRxA75nBjGDwjAwq65nDuBZUaDOvHPGDrps&e=>,
>> this looks correct. But to note, we see this series of messages (along with
>> the RSA key signature output) repeated several times in rapid fire. We
>> never observe any messages about the trust store being added; however that
>> might be something that occurs only on client initiation (?)
>>
>> Additionally, we do see cassandra report that the Encrypted Messaging
>> service has been started.
>>
>> INFO  [main] 2019-08-15 18:45:31,022 MessagingService.java:704 - Starting Encrypted Messaging Service on SSL port 7001
>>
>> *Doesn't appear to be a cassandra.yaml configuration problem*
>>
>> We can bring the node back online by simply configuring internode_encryption:
>> none. This action seems to rule out a broadcast_address or rpc_address
>> configuration problem.
>>
>> *How we built our keystore/truststores*
>>
>> We followed the basic template datastax docs for preparing SSL
>> certificates
>> <https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/configuration/secureSSLCertWithCA.html>.
>> One minor difference was that our private key and CSRs were generated using
>> openssl. One per each region (we plan to share key/signed certs across
>> nodes in regions). This was created using a command template as:
>>
>> openssl req -new -newkey rsa:2048 -out cassy-<region>.csr -keyout cassy-<region>.key -config cassy-<region>.conf -subj "..." -nodes -sha256
>>
>> The generated CSR was then signed by an internal root CA. Because we
>> generated our files using openssl, we had to build our jks files by
>> importing our certs into them.
>>
>> *Commands to generate truststore*
>>
>> We distribute this one file to all nodes.
>>
>> keytool -importcert
>>     -keystore generic-server-truststore.jks
>>     -alias rootCa
>>     -file rootCa.crt
>>     -noprompt
>>     -keypass omitted
>>     -storepass omitted
>>
>> *Commands to generate keystore*
>>
>> This was done one per region; but essentially we created a keystore with
>> keytool, then deleted the key entry and then imported our key entry using
>> keytool from a pkcs12 file.
>>
>> keytool -genkeypair -keyalg RSA -alias cassy-${region} -keystore cassy-${region}.jks -storepass omitted -keypass omitted -validity 365 -keysize 2048 -dname "..."
>>
>> keytool -delete -alias cassy-${region} -keystore cassy-${region}.jks -storepass omitted
>>
>> openssl pkcs12 -export -in signed_certs/${region}.pem -inkey keys/cassandra.${region}.key -name cassy-${region} -out ${region}.p12
>>
>> keytool -importkeystore -deststorepass omitted -destkeystore cassy-${region}.jks -srckeystore ${region}.p12 -srcstoretype PKCS12
>>
>> keytool -importcert -keystore cassy-${region}.jks -alias rootCa -file ca.crt -noprompt -keypass omitted -storepass omitted
>>
>> Looking back at this, I don't remember why we used keytool to generate a
>> keypair/keystore, then deleted and imported. I think it was because the
>> keytool importkeystore command refused to run if the keystore didn't
>> already exist.
>>
>> *ca.crt and pem file*
>>
>> The ca.crt file contains the root certificate and the intermediate
>> certificate that was used to sign the CSR. The pem file contains the signed
>> CSR returned to us, the intermediate cert, and the root CA (in that order).
>>
>> *openssl verify ca.crt and pem*
>>
>> openssl verify -CAfile ca.crt us-west-2.pem
>> signed_certs/us-west-2.pem: OK
>>
>> *Command output after enabling encryption*
>>
>> *nodetool status (output truncated)*
>>
>> Datacenter: us-east
>> ===================
>> Status=Up/Down
>> |/ State=Normal/Leaving/Joining/Moving
>> --  Address         Load       Tokens       Owns (effective)  Host ID                               Rack
>> ?N  52.44.11.221    ?          256          25.4%             null                                  1c
>> ...
>> ?N  52.204.232.195  ?          256          23.2%             null                                  1d
>> Datacenter: us-west-2
>> =====================
>> Status=Up/Down
>> |/ State=Normal/Leaving/Joining/Moving
>> --  Address         Load       Tokens       Owns (effective)  Host ID                               Rack
>> ?N  34.209.2.144    ?          256          26.5%             null                                  2c
>> UN  52.40.32.177    105.99 GiB  256          23.7%             null                                  2c
>> ?N  34.210.109.203  ?          256          24.7%             null                                  2a
>> ...
>>
>> With the online node being the node with encryption set.
>>
>> *cqlsh to localhost*
>>
>> cassy-node6:~$ cqlsh
>> Connection error: ('Unable to connect to any servers', {'127.0.0.1': error(111, "Tried connecting to [('127.0.0.1', 9042)]. Last error: Connection refused")})
>>
>> *cqlsh to remote node* Remote node is a node with encryption enabled
>>
>> cassy-node6:~$ cqlsh 10.0.2.7
>> Connection error: ('Unable to connect to any servers', {'10.0.2.7': error(111, "Tried connecting to [('10.0.2.7', 9042)]. Last error: Connection refused")})
>>
>>

Re: unable to gossip with peers exception when internode encryption is set to any setting other than 'none'

Posted by Marc Selwan <ma...@datastax.com>.
which exact version of OpenJDK are you using? Is it possible you don't have
JCE on those nodes? (I believe more recent versions of Java 8 has this
baked in so that might not be it)


*Marc Selwan | *DataStax *| *PM, Server Team *|* *(925) 413-7079* *|*
Twitter <https://twitter.com/MarcSelwan>

*  Quick links | *DataStax <http://www.datastax.com> *| *Training
<http://www.academy.datastax.com> *| *Documentation
<http://www.datastax.com/documentation/getting_started/doc/getting_started/gettingStartedIntro_r.html>
 *| *Downloads <http://www.datastax.com/download>



On Mon, Aug 26, 2019 at 1:56 PM Michael Carlise
<mc...@salesforce.com.invalid> wrote:

>
> I originally opened this issue on stackoverflow (
> https://stackoverflow.com/questions/57516660/cassandra-node-to-node-encryption-throws-unable-to-gossip-with-peers-exception
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__stackoverflow.com_questions_57516660_cassandra-2Dnode-2Dto-2Dnode-2Dencryption-2Dthrows-2Dunable-2Dto-2Dgossip-2Dwith-2Dpeers-2Dexception&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=E6NVfMr2TIhW42QMfARTvsfCLtdF-oEA3KfAQRfVZdk&m=KdhQDpMbz8v1GYrbdYL_opGq-GBPXftrEYEkgcGeMp0&s=4CR8PRQopb4FyCLj8PDI44mSouBz65Yx8THnH8tOb7o&e=>
> ).
>
> However, I haven't gotten any responses in over a week.  I'm going to post
> it here and maybe someone will have an idea on where I can look.
>
> We currently run a multi region cassandra cluster in AWS. It runs in four
> regions, 12 nodes per region. It runs without node to node encryption (or
> client encryption either). We are trying to enable inter datacenter node to
> node encryption. However, when we flip encryption over we get an exception
> that nodes are unable to gossip with any peers.
>
> It could possibly be that we didn't build our jks keystore/truststores
> correctly (more on how we built these files below). But, we additionally do
> not see intra datacenter communication working (which should be set to
> unencrypted communication). Additionally, cqlsh cannot connect to the node
> either; even though we have (by default) client_auth_required set to false
> .
>
> ERROR [main] 2019-08-15 18:46:32,241 CassandraDaemon.java:749 - Exception encountered during startup
> java.lang.RuntimeException: Unable to gossip with any peers
>         at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1435) ~[apache-cassandra-3.11.4.jar:3.11.4]
>         at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:566) ~[apache-cassandra-3.11.4.jar:3.11.4]
>         at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:823) ~[apache-cassandra-3.11.4.jar:3.11.4]
>         at org.apache.cassandra.service.StorageService.initServer(StorageService.java:683) ~[apache-cassandra-3.11.4.jar:3.11.4]
>         at org.apache.cassandra.service.StorageService.initServer(StorageService.java:632) ~[apache-cassandra-3.11.4.jar:3.11.4]
>         at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:388) [apache-cassandra-3.11.4.jar:3.11.4]
>         at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:620) [apache-cassandra-3.11.4.jar:3.11.4]
>         at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:732) [apache-cassandra-3.11.4.jar:3.11.4]
> INFO  [main] 2019-08-15 18:47:07,384 YamlConfigurationLoader.java:89 - Configuration location: file:/etc/cassandra/cassandra.yaml
>
>
> Something to note is that this error message occurs after a few minutes of
> the node being up. (i.e. there is a delay between start up before this
> exception is thrown).
>
> *Information about our cassandra setup*
>
> cassandra version: 3.11.4
> JDK version: openjdk-8.
> Linux: Ubuntu 18.04 (bionic).
>
> *cassandra.yaml*
>
> endpoint_snitch: Ec2MultiRegionSnitch
>
> server_encryption_options:
>   internode_encryption: dc
>   keystore: <omitted>
>   keystore_password: <omitted>
>   truststore: <omitted>
>   truststore_password: <omitted>
>
> client_encryption_options:
>   enabled: false
>
> *cassandra-rackdc.properties*
>
> prefer_local=true
>
> *No obvious errors with SSH output*
>
> When starting cassandra with JVM_OPTS="$JVM_OPTS -Djavax.net.debug=ssl" added
> to cassandra-env.sh we see SSL logs printed to stdout (*Note: Subject and
> Issuer were omitted on purpose)*.
>
> found key for : cassy-us-west-2
> adding as trusted cert:
>   Subject: ...
>   Issuer:  ...
>   Algorithm: RSA; Serial number: 0xdad28d843fc73325d4c1a75207d4e74
>   Valid from Fri May 27 00:00:00 UTC 2016 until Tue May 26 23:59:59 UTC 2026
>
> ...
>
> trigger seeding of SecureRandom
> done seeding SecureRandom
>
> Looking at Java SE SSL/TLS connection debugging
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.oracle.com_javase_7_docs_technotes_guides_security_jsse_ReadDebug.html&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=E6NVfMr2TIhW42QMfARTvsfCLtdF-oEA3KfAQRfVZdk&m=KdhQDpMbz8v1GYrbdYL_opGq-GBPXftrEYEkgcGeMp0&s=SR3ashwvSRxA75nBjGDwjAwq65nDuBZUaDOvHPGDrps&e=>,
> this looks correct. But to note, we see this series of messages (along with
> the RSA key signature output) repeated several times in rapid fire. We
> never observe any messages about the trust store being added; however that
> might be something that occurs only on client initiation (?)
>
> Additionally, we do see cassandra report that the Encrypted Messaging
> service has been started.
>
> INFO  [main] 2019-08-15 18:45:31,022 MessagingService.java:704 - Starting Encrypted Messaging Service on SSL port 7001
>
> *Doesn't appear to be a cassandra.yaml configuration problem*
>
> We can bring the node back online by simply configuring internode_encryption:
> none. This action seems to rule out a broadcast_address or rpc_address
> configuration problem.
>
> *How we built our keystore/truststores*
>
> We followed the basic template datastax docs for preparing SSL
> certificates
> <https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/configuration/secureSSLCertWithCA.html>.
> One minor difference was that our private key and CSRs were generated using
> openssl. One per each region (we plan to share key/signed certs across
> nodes in regions). This was created using a command template as:
>
> openssl req -new -newkey rsa:2048 -out cassy-<region>.csr -keyout cassy-<region>.key -config cassy-<region>.conf -subj "..." -nodes -sha256
>
> The generated CSR was then signed by an internal root CA. Because we
> generated our files using openssl, we had to build our jks files by
> importing our certs into them.
>
> *Commands to generate truststore*
>
> We distribute this one file to all nodes.
>
> keytool -importcert
>     -keystore generic-server-truststore.jks
>     -alias rootCa
>     -file rootCa.crt
>     -noprompt
>     -keypass omitted
>     -storepass omitted
>
> *Commands to generate keystore*
>
> This was done one per region; but essentially we created a keystore with
> keytool, then deleted the key entry and then imported our key entry using
> keytool from a pkcs12 file.
>
> keytool -genkeypair -keyalg RSA -alias cassy-${region} -keystore cassy-${region}.jks -storepass omitted -keypass omitted -validity 365 -keysize 2048 -dname "..."
>
> keytool -delete -alias cassy-${region} -keystore cassy-${region}.jks -storepass omitted
>
> openssl pkcs12 -export -in signed_certs/${region}.pem -inkey keys/cassandra.${region}.key -name cassy-${region} -out ${region}.p12
>
> keytool -importkeystore -deststorepass omitted -destkeystore cassy-${region}.jks -srckeystore ${region}.p12 -srcstoretype PKCS12
>
> keytool -importcert -keystore cassy-${region}.jks -alias rootCa -file ca.crt -noprompt -keypass omitted -storepass omitted
>
> Looking back at this, I don't remember why we used keytool to generate a
> keypair/keystore, then deleted and imported. I think it was because the
> keytool importkeystore command refused to run if the keystore didn't
> already exist.
>
> *ca.crt and pem file*
>
> The ca.crt file contains the root certificate and the intermediate
> certificate that was used to sign the CSR. The pem file contains the signed
> CSR returned to us, the intermediate cert, and the root CA (in that order).
>
> *openssl verify ca.crt and pem*
>
> openssl verify -CAfile ca.crt us-west-2.pem
> signed_certs/us-west-2.pem: OK
>
> *Command output after enabling encryption*
>
> *nodetool status (output truncated)*
>
> Datacenter: us-east
> ===================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address         Load       Tokens       Owns (effective)  Host ID                               Rack
> ?N  52.44.11.221    ?          256          25.4%             null                                  1c
> ...
> ?N  52.204.232.195  ?          256          23.2%             null                                  1d
> Datacenter: us-west-2
> =====================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address         Load       Tokens       Owns (effective)  Host ID                               Rack
> ?N  34.209.2.144    ?          256          26.5%             null                                  2c
> UN  52.40.32.177    105.99 GiB  256          23.7%             null                                  2c
> ?N  34.210.109.203  ?          256          24.7%             null                                  2a
> ...
>
> With the online node being the node with encryption set.
>
> *cqlsh to localhost*
>
> cassy-node6:~$ cqlsh
> Connection error: ('Unable to connect to any servers', {'127.0.0.1': error(111, "Tried connecting to [('127.0.0.1', 9042)]. Last error: Connection refused")})
>
> *cqlsh to remote node* Remote node is a node with encryption enabled
>
> cassy-node6:~$ cqlsh 10.0.2.7
> Connection error: ('Unable to connect to any servers', {'10.0.2.7': error(111, "Tried connecting to [('10.0.2.7', 9042)]. Last error: Connection refused")})
>
>