You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cassandra.apache.org by Tommy Stendahl <to...@ericsson.com> on 2018/10/23 09:44:03 UTC

Testing 4.0 upgrading

Hi,

I have been testing upgrade to 4.0, I started out with a cluster with 
3.0.15 and server encryption enabled. Due to some issues in my 
environment I did upgrade one of the nodes to 3.11.3, I think this 
turned out to be a good thing since I could observer the behaviour of 
upgrading from both 3.0.15 and 3.11.3 at the same time.

At first I didn't have much success at all, it look like found multiple 
issues mostly with server encryption so I decided to simplify thing and 
disabled server encryption.

So with server encryption disabled the upgrade was working ok, what I 
did notice was exceptions in the 3.0.15 and 3.11.3 nodes once the first 
4.0 node started.

3.0.15 exception:
2018-10-22T11:05:38.883+0200 ERROR 
[MessagingService-Incoming-/10.216.193.244] CassandraDaemon.java:223 
Exception in thread Thread[MessagingService-Incoming-/10.216.193.244,5,main]
java.lang.RuntimeException: Unknown column coordinator_port during 
deserialization
         at 
org.apache.cassandra.db.Columns$Serializer.deserialize(Columns.java:433) 
~[apache-cassandra-3.0.15.jar:3.0.15]
         at 
org.apache.cassandra.db.filter.ColumnFilter$Serializer.deserialize(ColumnFilter.java:447) 
~[apache-cassandra-3.0.15.jar:3.0.15]
         at 
org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:647) 
~[apache-cassandra-3.0.15.jar:3.0.15]
         at 
org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:584) 
~[apache-cassandra-3.0.15.jar:3.0.15]
         at 
org.apache.cassandra.io.ForwardingVersionedSerializer.deserialize(ForwardingVersionedSerializer.java:50) 
~[apache-cassandra-3.0.15.jar:3.0.15]
         at org.apache.cassandra.net.MessageIn.read(MessageIn.java:98) 
~[apache-cassandra-3.0.15.jar:3.0.15]
         at 
org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:201) 
~[apache-cassandra-3.0.15.jar:3.0.15]
         at 
org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:178) 
~[apache-cassandra-3.0.15.jar:3.0.15]
         at 
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:92) 
~[apache-cassandra-3.0.15.jar:3.0.15]

3.11.3 exception:
2018-10-22T11:12:05.060+0200 ERROR 
[MessagingService-Incoming-/10.216.193.244] CassandraDaemon.java:228 
Exception in thread Thread[MessagingService-Incoming-/10.216.193.244,5,main]
java.lang.RuntimeException: Unknown column coordinator_port during 
deserialization
         at 
org.apache.cassandra.db.Columns$Serializer.deserialize(Columns.java:452) 
~[apache-cassandra-3.11.3.jar:3.11.3]
         at 
org.apache.cassandra.db.filter.ColumnFilter$Serializer.deserialize(ColumnFilter.java:482) 
~[apache-cassandra-3.11.3.jar:3.11.3]
         at 
org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:760) 
~[apache-cassandra-3.11.3.jar:3.11.3]
         at 
org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:697) 
~[apache-cassandra-3.11.3.jar:3.11.3]
         at 
org.apache.cassandra.io.ForwardingVersionedSerializer.deserialize(ForwardingVersionedSerializer.java:50) 
~[apache-cassandra-3.11.3.jar:3.11.3]
         at org.apache.cassandra.net.MessageIn.read(MessageIn.java:123) 
~[apache-cassandra-3.11.3.jar:3.11.3]
         at 
org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:192) 
~[apache-cassandra-3.11.3.jar:3.11.3]
         at 
org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:180) 
~[apache-cassandra-3.11.3.jar:3.11.3]
         at 
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:94) 
~[apache-cassandra-3.11.3.jar:3.11.3]

I got one or two of these exceptions every minute, I could not see that 
this was causing any problems but I don't really like exceptions in the 
log. Should we expect exceptions like this during an upgrade or is this 
a problem we should look in to?

So now I enabled server encryption again, with this configuration:
storage_port: 12700
ssl_storage_port: 12701
server_encryption_options:
     # set to true for allowing secure incoming connections
     enabled: true
     # If enabled and optional are both set to true, encrypted and 
unencrypted connections are handled on the storage_port
     optional: false
     # if enabled, will open up an encrypted listening socket on 
ssl_storage_port. Should be used
     # during upgrade to 4.0; otherwise, set to false.
     enable_legacy_ssl_storage_port: true
     # on outbound connections, determine which type of peers to 
securely connect to. 'enabled' must be set to true.
     internode_encryption: all
     keystore: /usr/share/cassandra/.ssl/cil-intern/server/keystore.jks
     keystore_password: '*********'
     truststore: /usr/share/cassandra/.ssl/cil-intern/server/truststore.jks
     truststore_password: '**********'
     # More advanced defaults below:
     protocol: TLSv1.2
     # store_type: JKS
     cipher_suites: 
[TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_RSA_WITH_AES_128_CBC_SHA]
     # require_client_auth: false
     # require_endpoint_verification: false

As far as I understand this configuration is correct but the 4.0 node 
can't contact the old nodes. I have tried with the default protocol 
setting also but it makes no difference. The behaviour is different in 
3.0.15 and 3.11.3, in 3.0.15 I get this exception:
2018-10-22T11:57:44.366+0200 ERROR 
[MessagingService-NettyInbound-Thread-3-8] 
InboundHandshakeHandler.java:300 Failed to properly handshake with peer 
/10.216.193.246:58296. Closing the channel.
io.netty.handler.codec.DecoderException: 
javax.net.ssl.SSLHandshakeException: SSLv2Hello is disabled
         at 
io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:459)
         at 
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265)
         at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
         at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
         at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
         at 
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434)
         at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
         at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
         at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965)
         at 
io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:808)
         at 
io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:417)
         at 
io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:317)
         at 
io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884)
         at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
         at java.lang.Thread.run(Thread.java:748)
Caused by: javax.net.ssl.SSLHandshakeException: SSLv2Hello is disabled
         at 
sun.security.ssl.InputRecord.handleUnknownRecord(InputRecord.java:637)
         at sun.security.ssl.InputRecord.read(InputRecord.java:527)
         at 
sun.security.ssl.EngineInputRecord.read(EngineInputRecord.java:382)
         at 
sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:962)
         at 
sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:907)
         at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:781)
         at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624)
         at 
io.netty.handler.ssl.SslHandler$SslEngineType$3.unwrap(SslHandler.java:294)
         at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1275)
         at 
io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1177)
         at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1221)
         at 
io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489)
         at 
io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428)
         ... 14 common frames omitted

and in the 4.0 debug log I see this related to the 3.0.15 node:
2018-10-22T13:25:34.414+0200 DEBUG [ScheduledFastTasks:1] 
OutboundMessagingConnection.java:259 connection attempt 788 to 
10.216.193.246:12700 (GOSSIP)
2018-10-22T13:25:34.414+0200 DEBUG [ScheduledFastTasks:1] 
NettyFactory.java:321 creating outbound bootstrap to peer 
10.216.193.246:12700, compression: true, encryption: enabled (jdk), 
coalesce: DISABLED, protocolV
ersion: 12
2018-10-22T13:25:34.526+0200 DEBUG [GossipTasks:1] 
OutboundMessagingConnection.java:259 connection attempt 789 to 
10.216.193.246:12700 (GOSSIP)
2018-10-22T13:25:34.527+0200 DEBUG [GossipTasks:1] NettyFactory.java:321 
creating outbound bootstrap to peer 10.216.193.246:12700, compression: 
true, encryption: enabled (jdk), coalesce: DISABLED, protocolVersion:
  12

In the 3.11.3 node I don't see any errors, I do see this:
2018-10-22T13:20:36.270+0200  INFO [HANDSHAKE-/10.216.193.244] 
OutboundTcpConnection.java:561 Handshaking version with /10.216.193.244

and in the 4.0 node I find this:
2018-10-22T13:20:36.270+0200  INFO 
[MessagingService-NettyInbound-Thread-3-1] 
InboundHandshakeHandler.java:146 connection from peer 
/10.216.193.243:54262, protocol = TLSv1.2, cipher suite = 
TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
2018-10-22T13:20:41.147+0200 DEBUG [main] 
OutboundMessagingConnection.java:259 connection attempt 0 to 
10.216.193.243:12700 (GOSSIP)
2018-10-22T13:20:41.147+0200 DEBUG [main] NettyFactory.java:321 creating 
outbound bootstrap to peer 10.216.193.243:12700, compression: true, 
encryption: enabled (jdk), coalesce: DISABLED, protocolVersion: 11
2018-10-22T13:20:41.248+0200 DEBUG [ScheduledFastTasks:1] 
OutboundMessagingConnection.java:259 connection attempt 1 to 
10.216.193.243:12700 (GOSSIP)
2018-10-22T13:20:41.248+0200 DEBUG [ScheduledFastTasks:1] 
NettyFactory.java:321 creating outbound bootstrap to peer 
10.216.193.243:12700, compression: true, encryption: enabled (jdk), 
coalesce: DISABLED, protocolV
ersion: 11

It looks like the 4.0 node doesn’t accept incoming messages from the old 
nodes on the ssl_storage_port and continues to try to contact the old 
nodes on the storage_port that are disabled in the old nodes. I don't 
really know what to make of this, any ideas or suggestions?

Regards,
Tommy



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org


Re: Testing 4.0 upgrading

Posted by Nate McCall <zz...@gmail.com>.
> Now I will reinstall the old version and do the upgrade again and
> hopefully I can have more information soon.
>

Hi Tommy,
Thanks for taking the time to brave an upgrade test this early on - it
is super helpful to get this feedback.

Anyone else that has bandwidth, we very much appreciate these types of
tests, so please keep them coming.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org


Re: Testing 4.0 upgrading

Posted by Tommy Stendahl <to...@ericsson.com>.
Hi Jason,

Thanks for responding.

I have created two jiras for these things: CASSANDRA-14841 and 
CASSANDRA-14842.

I understand the initial 4.0->3.x connection problem, since the incoming 
connection from the old node isn't accepted the 4.0 node will never know 
that it tries to connect to a old node and continuous to try to connect 
on the storage_port instead on the ssl_storage_port.

I will enable SSL debugging and activate wire tracing and see if I can 
find out more.

I can also add that I have upgraded all nodes in the cluster to 4.0 and 
once a node was upgraded to 4.0 it started talking to the other 4.0 node 
as expected. I had data in the cluster from before the upgrade and after 
doing "nodetool upgradesstables" there where no issues restarting the 
clients.

Now I will reinstall the old version and do the upgrade again and 
hopefully I can have more information soon.

/Tommy


On 2018-10-23 19:12, Jason Brown wrote:
> Hi Tommy,
>
> Thank you for taking the time to start kicking the tires on the upgrade to
> 4.0. It looks like you've found two bugs:
>
> 1) "Unknown column coordinator_port during deserialization" (reported on
> 3.x nodes)
>
> - looks like the 4.0 node isn't filtering out a column from a system table
> that 3.0 doesn't know about. Most likely due to CASSANDRA-7544. Can you
> open a JIRA for this, and tag @aweisberg?
>
> 2) SSL connection problems
>
> I unserstand the 4.0 -> 3.X connection problem, and documented it at [1] in
> MessagingService. TL;DR we don't know the version of a peer when restarting
> and need to wait for that peer to connect to the local node and pass it's
> correct messaging version (if the local node cannot interpret the handhsake
> from the peer). However, why for the inbound connection to the 4.0 node it
> is seeing SSLv2 is unclear. Can you open a separate JIRA, and we'll go from
> there? In the meantime, maybe enable the JDK's SSL debugging [2] on the 3.x
> node to see exactly what it is trying to do? Also, you can enable wire
> tracing on the 4.0 node by setting this value to true [3] and recompiling.
> We can followup further in the jira.
>
> Thanks!
>
> -Jason
>
> [1]
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/MessagingService.java#L1668
> [2]
> https://docs.oracle.com/javase/8/docs/technotes/guides/security/jsse/ReadDebug.html
> [3]
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/async/NettyFactory.java#L84
>
> On Tue, Oct 23, 2018 at 2:44 AM Tommy Stendahl <to...@ericsson.com>
> wrote:
>
>> Hi,
>>
>> I have been testing upgrade to 4.0, I started out with a cluster with
>> 3.0.15 and server encryption enabled. Due to some issues in my
>> environment I did upgrade one of the nodes to 3.11.3, I think this
>> turned out to be a good thing since I could observer the behaviour of
>> upgrading from both 3.0.15 and 3.11.3 at the same time.
>>
>> At first I didn't have much success at all, it look like found multiple
>> issues mostly with server encryption so I decided to simplify thing and
>> disabled server encryption.
>>
>> So with server encryption disabled the upgrade was working ok, what I
>> did notice was exceptions in the 3.0.15 and 3.11.3 nodes once the first
>> 4.0 node started.
>>
>> 3.0.15 exception:
>> 2018-10-22T11:05:38.883+0200 ERROR
>> [MessagingService-Incoming-/10.216.193.244] CassandraDaemon.java:223
>> Exception in thread Thread[MessagingService-Incoming-/10.216.193.244
>> ,5,main]
>> java.lang.RuntimeException: Unknown column coordinator_port during
>> deserialization
>>           at
>> org.apache.cassandra.db.Columns$Serializer.deserialize(Columns.java:433)
>> ~[apache-cassandra-3.0.15.jar:3.0.15]
>>           at
>> org.apache.cassandra.db.filter.ColumnFilter$Serializer.deserialize(ColumnFilter.java:447)
>>
>> ~[apache-cassandra-3.0.15.jar:3.0.15]
>>           at
>> org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:647)
>>
>> ~[apache-cassandra-3.0.15.jar:3.0.15]
>>           at
>> org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:584)
>>
>> ~[apache-cassandra-3.0.15.jar:3.0.15]
>>           at
>> org.apache.cassandra.io.ForwardingVersionedSerializer.deserialize(ForwardingVersionedSerializer.java:50)
>>
>> ~[apache-cassandra-3.0.15.jar:3.0.15]
>>           at org.apache.cassandra.net.MessageIn.read(MessageIn.java:98)
>> ~[apache-cassandra-3.0.15.jar:3.0.15]
>>           at
>> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:201)
>>
>> ~[apache-cassandra-3.0.15.jar:3.0.15]
>>           at
>> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:178)
>>
>> ~[apache-cassandra-3.0.15.jar:3.0.15]
>>           at
>> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:92)
>>
>> ~[apache-cassandra-3.0.15.jar:3.0.15]
>>
>> 3.11.3 exception:
>> 2018-10-22T11:12:05.060+0200 ERROR
>> [MessagingService-Incoming-/10.216.193.244] CassandraDaemon.java:228
>> Exception in thread Thread[MessagingService-Incoming-/10.216.193.244
>> ,5,main]
>> java.lang.RuntimeException: Unknown column coordinator_port during
>> deserialization
>>           at
>> org.apache.cassandra.db.Columns$Serializer.deserialize(Columns.java:452)
>> ~[apache-cassandra-3.11.3.jar:3.11.3]
>>           at
>> org.apache.cassandra.db.filter.ColumnFilter$Serializer.deserialize(ColumnFilter.java:482)
>>
>> ~[apache-cassandra-3.11.3.jar:3.11.3]
>>           at
>> org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:760)
>>
>> ~[apache-cassandra-3.11.3.jar:3.11.3]
>>           at
>> org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:697)
>>
>> ~[apache-cassandra-3.11.3.jar:3.11.3]
>>           at
>> org.apache.cassandra.io.ForwardingVersionedSerializer.deserialize(ForwardingVersionedSerializer.java:50)
>>
>> ~[apache-cassandra-3.11.3.jar:3.11.3]
>>           at org.apache.cassandra.net.MessageIn.read(MessageIn.java:123)
>> ~[apache-cassandra-3.11.3.jar:3.11.3]
>>           at
>> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:192)
>>
>> ~[apache-cassandra-3.11.3.jar:3.11.3]
>>           at
>> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:180)
>>
>> ~[apache-cassandra-3.11.3.jar:3.11.3]
>>           at
>> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:94)
>>
>> ~[apache-cassandra-3.11.3.jar:3.11.3]
>>
>> I got one or two of these exceptions every minute, I could not see that
>> this was causing any problems but I don't really like exceptions in the
>> log. Should we expect exceptions like this during an upgrade or is this
>> a problem we should look in to?
>>
>> So now I enabled server encryption again, with this configuration:
>> storage_port: 12700
>> ssl_storage_port: 12701
>> server_encryption_options:
>>       # set to true for allowing secure incoming connections
>>       enabled: true
>>       # If enabled and optional are both set to true, encrypted and
>> unencrypted connections are handled on the storage_port
>>       optional: false
>>       # if enabled, will open up an encrypted listening socket on
>> ssl_storage_port. Should be used
>>       # during upgrade to 4.0; otherwise, set to false.
>>       enable_legacy_ssl_storage_port: true
>>       # on outbound connections, determine which type of peers to
>> securely connect to. 'enabled' must be set to true.
>>       internode_encryption: all
>>       keystore: /usr/share/cassandra/.ssl/cil-intern/server/keystore.jks
>>       keystore_password: '*********'
>>       truststore: /usr/share/cassandra/.ssl/cil-intern/server/truststore.jks
>>       truststore_password: '**********'
>>       # More advanced defaults below:
>>       protocol: TLSv1.2
>>       # store_type: JKS
>>       cipher_suites:
>>
>> [TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_RSA_WITH_AES_128_CBC_SHA]
>>       # require_client_auth: false
>>       # require_endpoint_verification: false
>>
>> As far as I understand this configuration is correct but the 4.0 node
>> can't contact the old nodes. I have tried with the default protocol
>> setting also but it makes no difference. The behaviour is different in
>> 3.0.15 and 3.11.3, in 3.0.15 I get this exception:
>> 2018-10-22T11:57:44.366+0200 ERROR
>> [MessagingService-NettyInbound-Thread-3-8]
>> InboundHandshakeHandler.java:300 Failed to properly handshake with peer
>> /10.216.193.246:58296. Closing the channel.
>> io.netty.handler.codec.DecoderException:
>> javax.net.ssl.SSLHandshakeException: SSLv2Hello is disabled
>>           at
>>
>> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:459)
>>           at
>>
>> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265)
>>           at
>>
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
>>           at
>>
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
>>           at
>>
>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
>>           at
>>
>> io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434)
>>           at
>>
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
>>           at
>>
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
>>           at
>>
>> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965)
>>           at
>>
>> io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:808)
>>           at
>> io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:417)
>>           at
>> io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:317)
>>           at
>>
>> io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884)
>>           at
>>
>> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>>           at java.lang.Thread.run(Thread.java:748)
>> Caused by: javax.net.ssl.SSLHandshakeException: SSLv2Hello is disabled
>>           at
>> sun.security.ssl.InputRecord.handleUnknownRecord(InputRecord.java:637)
>>           at sun.security.ssl.InputRecord.read(InputRecord.java:527)
>>           at
>> sun.security.ssl.EngineInputRecord.read(EngineInputRecord.java:382)
>>           at
>> sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:962)
>>           at
>> sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:907)
>>           at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:781)
>>           at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624)
>>           at
>> io.netty.handler.ssl.SslHandler$SslEngineType$3.unwrap(SslHandler.java:294)
>>           at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1275)
>>           at
>> io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1177)
>>           at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1221)
>>           at
>>
>> io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489)
>>           at
>>
>> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428)
>>           ... 14 common frames omitted
>>
>> and in the 4.0 debug log I see this related to the 3.0.15 node:
>> 2018-10-22T13:25:34.414+0200 DEBUG [ScheduledFastTasks:1]
>> OutboundMessagingConnection.java:259 connection attempt 788 to
>> 10.216.193.246:12700 (GOSSIP)
>> 2018-10-22T13:25:34.414+0200 DEBUG [ScheduledFastTasks:1]
>> NettyFactory.java:321 creating outbound bootstrap to peer
>> 10.216.193.246:12700, compression: true, encryption: enabled (jdk),
>> coalesce: DISABLED, protocolV
>> ersion: 12
>> 2018-10-22T13:25:34.526+0200 DEBUG [GossipTasks:1]
>> OutboundMessagingConnection.java:259 connection attempt 789 to
>> 10.216.193.246:12700 (GOSSIP)
>> 2018-10-22T13:25:34.527+0200 DEBUG [GossipTasks:1] NettyFactory.java:321
>> creating outbound bootstrap to peer 10.216.193.246:12700, compression:
>> true, encryption: enabled (jdk), coalesce: DISABLED, protocolVersion:
>>    12
>>
>> In the 3.11.3 node I don't see any errors, I do see this:
>> 2018-10-22T13:20:36.270+0200  INFO [HANDSHAKE-/10.216.193.244]
>> OutboundTcpConnection.java:561 Handshaking version with /10.216.193.244
>>
>> and in the 4.0 node I find this:
>> 2018-10-22T13:20:36.270+0200  INFO
>> [MessagingService-NettyInbound-Thread-3-1]
>> InboundHandshakeHandler.java:146 connection from peer
>> /10.216.193.243:54262, protocol = TLSv1.2, cipher suite =
>> TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
>> 2018-10-22T13:20:41.147+0200 DEBUG [main]
>> OutboundMessagingConnection.java:259 connection attempt 0 to
>> 10.216.193.243:12700 (GOSSIP)
>> 2018-10-22T13:20:41.147+0200 DEBUG [main] NettyFactory.java:321 creating
>> outbound bootstrap to peer 10.216.193.243:12700, compression: true,
>> encryption: enabled (jdk), coalesce: DISABLED, protocolVersion: 11
>> 2018-10-22T13:20:41.248+0200 DEBUG [ScheduledFastTasks:1]
>> OutboundMessagingConnection.java:259 connection attempt 1 to
>> 10.216.193.243:12700 (GOSSIP)
>> 2018-10-22T13:20:41.248+0200 DEBUG [ScheduledFastTasks:1]
>> NettyFactory.java:321 creating outbound bootstrap to peer
>> 10.216.193.243:12700, compression: true, encryption: enabled (jdk),
>> coalesce: DISABLED, protocolV
>> ersion: 11
>>
>> It looks like the 4.0 node doesn’t accept incoming messages from the old
>> nodes on the ssl_storage_port and continues to try to contact the old
>> nodes on the storage_port that are disabled in the old nodes. I don't
>> really know what to make of this, any ideas or suggestions?
>>
>> Regards,
>> Tommy
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>
>>



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org


Re: Testing 4.0 upgrading

Posted by Jason Brown <ja...@gmail.com>.
Hi Tommy,

Thank you for taking the time to start kicking the tires on the upgrade to
4.0. It looks like you've found two bugs:

1) "Unknown column coordinator_port during deserialization" (reported on
3.x nodes)

- looks like the 4.0 node isn't filtering out a column from a system table
that 3.0 doesn't know about. Most likely due to CASSANDRA-7544. Can you
open a JIRA for this, and tag @aweisberg?

2) SSL connection problems

I unserstand the 4.0 -> 3.X connection problem, and documented it at [1] in
MessagingService. TL;DR we don't know the version of a peer when restarting
and need to wait for that peer to connect to the local node and pass it's
correct messaging version (if the local node cannot interpret the handhsake
from the peer). However, why for the inbound connection to the 4.0 node it
is seeing SSLv2 is unclear. Can you open a separate JIRA, and we'll go from
there? In the meantime, maybe enable the JDK's SSL debugging [2] on the 3.x
node to see exactly what it is trying to do? Also, you can enable wire
tracing on the 4.0 node by setting this value to true [3] and recompiling.
We can followup further in the jira.

Thanks!

-Jason

[1]
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/MessagingService.java#L1668
[2]
https://docs.oracle.com/javase/8/docs/technotes/guides/security/jsse/ReadDebug.html
[3]
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/async/NettyFactory.java#L84

On Tue, Oct 23, 2018 at 2:44 AM Tommy Stendahl <to...@ericsson.com>
wrote:

> Hi,
>
> I have been testing upgrade to 4.0, I started out with a cluster with
> 3.0.15 and server encryption enabled. Due to some issues in my
> environment I did upgrade one of the nodes to 3.11.3, I think this
> turned out to be a good thing since I could observer the behaviour of
> upgrading from both 3.0.15 and 3.11.3 at the same time.
>
> At first I didn't have much success at all, it look like found multiple
> issues mostly with server encryption so I decided to simplify thing and
> disabled server encryption.
>
> So with server encryption disabled the upgrade was working ok, what I
> did notice was exceptions in the 3.0.15 and 3.11.3 nodes once the first
> 4.0 node started.
>
> 3.0.15 exception:
> 2018-10-22T11:05:38.883+0200 ERROR
> [MessagingService-Incoming-/10.216.193.244] CassandraDaemon.java:223
> Exception in thread Thread[MessagingService-Incoming-/10.216.193.244
> ,5,main]
> java.lang.RuntimeException: Unknown column coordinator_port during
> deserialization
>          at
> org.apache.cassandra.db.Columns$Serializer.deserialize(Columns.java:433)
> ~[apache-cassandra-3.0.15.jar:3.0.15]
>          at
> org.apache.cassandra.db.filter.ColumnFilter$Serializer.deserialize(ColumnFilter.java:447)
>
> ~[apache-cassandra-3.0.15.jar:3.0.15]
>          at
> org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:647)
>
> ~[apache-cassandra-3.0.15.jar:3.0.15]
>          at
> org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:584)
>
> ~[apache-cassandra-3.0.15.jar:3.0.15]
>          at
> org.apache.cassandra.io.ForwardingVersionedSerializer.deserialize(ForwardingVersionedSerializer.java:50)
>
> ~[apache-cassandra-3.0.15.jar:3.0.15]
>          at org.apache.cassandra.net.MessageIn.read(MessageIn.java:98)
> ~[apache-cassandra-3.0.15.jar:3.0.15]
>          at
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:201)
>
> ~[apache-cassandra-3.0.15.jar:3.0.15]
>          at
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:178)
>
> ~[apache-cassandra-3.0.15.jar:3.0.15]
>          at
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:92)
>
> ~[apache-cassandra-3.0.15.jar:3.0.15]
>
> 3.11.3 exception:
> 2018-10-22T11:12:05.060+0200 ERROR
> [MessagingService-Incoming-/10.216.193.244] CassandraDaemon.java:228
> Exception in thread Thread[MessagingService-Incoming-/10.216.193.244
> ,5,main]
> java.lang.RuntimeException: Unknown column coordinator_port during
> deserialization
>          at
> org.apache.cassandra.db.Columns$Serializer.deserialize(Columns.java:452)
> ~[apache-cassandra-3.11.3.jar:3.11.3]
>          at
> org.apache.cassandra.db.filter.ColumnFilter$Serializer.deserialize(ColumnFilter.java:482)
>
> ~[apache-cassandra-3.11.3.jar:3.11.3]
>          at
> org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:760)
>
> ~[apache-cassandra-3.11.3.jar:3.11.3]
>          at
> org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:697)
>
> ~[apache-cassandra-3.11.3.jar:3.11.3]
>          at
> org.apache.cassandra.io.ForwardingVersionedSerializer.deserialize(ForwardingVersionedSerializer.java:50)
>
> ~[apache-cassandra-3.11.3.jar:3.11.3]
>          at org.apache.cassandra.net.MessageIn.read(MessageIn.java:123)
> ~[apache-cassandra-3.11.3.jar:3.11.3]
>          at
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:192)
>
> ~[apache-cassandra-3.11.3.jar:3.11.3]
>          at
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:180)
>
> ~[apache-cassandra-3.11.3.jar:3.11.3]
>          at
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:94)
>
> ~[apache-cassandra-3.11.3.jar:3.11.3]
>
> I got one or two of these exceptions every minute, I could not see that
> this was causing any problems but I don't really like exceptions in the
> log. Should we expect exceptions like this during an upgrade or is this
> a problem we should look in to?
>
> So now I enabled server encryption again, with this configuration:
> storage_port: 12700
> ssl_storage_port: 12701
> server_encryption_options:
>      # set to true for allowing secure incoming connections
>      enabled: true
>      # If enabled and optional are both set to true, encrypted and
> unencrypted connections are handled on the storage_port
>      optional: false
>      # if enabled, will open up an encrypted listening socket on
> ssl_storage_port. Should be used
>      # during upgrade to 4.0; otherwise, set to false.
>      enable_legacy_ssl_storage_port: true
>      # on outbound connections, determine which type of peers to
> securely connect to. 'enabled' must be set to true.
>      internode_encryption: all
>      keystore: /usr/share/cassandra/.ssl/cil-intern/server/keystore.jks
>      keystore_password: '*********'
>      truststore: /usr/share/cassandra/.ssl/cil-intern/server/truststore.jks
>      truststore_password: '**********'
>      # More advanced defaults below:
>      protocol: TLSv1.2
>      # store_type: JKS
>      cipher_suites:
>
> [TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_RSA_WITH_AES_128_CBC_SHA]
>      # require_client_auth: false
>      # require_endpoint_verification: false
>
> As far as I understand this configuration is correct but the 4.0 node
> can't contact the old nodes. I have tried with the default protocol
> setting also but it makes no difference. The behaviour is different in
> 3.0.15 and 3.11.3, in 3.0.15 I get this exception:
> 2018-10-22T11:57:44.366+0200 ERROR
> [MessagingService-NettyInbound-Thread-3-8]
> InboundHandshakeHandler.java:300 Failed to properly handshake with peer
> /10.216.193.246:58296. Closing the channel.
> io.netty.handler.codec.DecoderException:
> javax.net.ssl.SSLHandshakeException: SSLv2Hello is disabled
>          at
>
> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:459)
>          at
>
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265)
>          at
>
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
>          at
>
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
>          at
>
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
>          at
>
> io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434)
>          at
>
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
>          at
>
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
>          at
>
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965)
>          at
>
> io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:808)
>          at
> io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:417)
>          at
> io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:317)
>          at
>
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884)
>          at
>
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>          at java.lang.Thread.run(Thread.java:748)
> Caused by: javax.net.ssl.SSLHandshakeException: SSLv2Hello is disabled
>          at
> sun.security.ssl.InputRecord.handleUnknownRecord(InputRecord.java:637)
>          at sun.security.ssl.InputRecord.read(InputRecord.java:527)
>          at
> sun.security.ssl.EngineInputRecord.read(EngineInputRecord.java:382)
>          at
> sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:962)
>          at
> sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:907)
>          at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:781)
>          at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624)
>          at
> io.netty.handler.ssl.SslHandler$SslEngineType$3.unwrap(SslHandler.java:294)
>          at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1275)
>          at
> io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1177)
>          at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1221)
>          at
>
> io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489)
>          at
>
> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428)
>          ... 14 common frames omitted
>
> and in the 4.0 debug log I see this related to the 3.0.15 node:
> 2018-10-22T13:25:34.414+0200 DEBUG [ScheduledFastTasks:1]
> OutboundMessagingConnection.java:259 connection attempt 788 to
> 10.216.193.246:12700 (GOSSIP)
> 2018-10-22T13:25:34.414+0200 DEBUG [ScheduledFastTasks:1]
> NettyFactory.java:321 creating outbound bootstrap to peer
> 10.216.193.246:12700, compression: true, encryption: enabled (jdk),
> coalesce: DISABLED, protocolV
> ersion: 12
> 2018-10-22T13:25:34.526+0200 DEBUG [GossipTasks:1]
> OutboundMessagingConnection.java:259 connection attempt 789 to
> 10.216.193.246:12700 (GOSSIP)
> 2018-10-22T13:25:34.527+0200 DEBUG [GossipTasks:1] NettyFactory.java:321
> creating outbound bootstrap to peer 10.216.193.246:12700, compression:
> true, encryption: enabled (jdk), coalesce: DISABLED, protocolVersion:
>   12
>
> In the 3.11.3 node I don't see any errors, I do see this:
> 2018-10-22T13:20:36.270+0200  INFO [HANDSHAKE-/10.216.193.244]
> OutboundTcpConnection.java:561 Handshaking version with /10.216.193.244
>
> and in the 4.0 node I find this:
> 2018-10-22T13:20:36.270+0200  INFO
> [MessagingService-NettyInbound-Thread-3-1]
> InboundHandshakeHandler.java:146 connection from peer
> /10.216.193.243:54262, protocol = TLSv1.2, cipher suite =
> TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
> 2018-10-22T13:20:41.147+0200 DEBUG [main]
> OutboundMessagingConnection.java:259 connection attempt 0 to
> 10.216.193.243:12700 (GOSSIP)
> 2018-10-22T13:20:41.147+0200 DEBUG [main] NettyFactory.java:321 creating
> outbound bootstrap to peer 10.216.193.243:12700, compression: true,
> encryption: enabled (jdk), coalesce: DISABLED, protocolVersion: 11
> 2018-10-22T13:20:41.248+0200 DEBUG [ScheduledFastTasks:1]
> OutboundMessagingConnection.java:259 connection attempt 1 to
> 10.216.193.243:12700 (GOSSIP)
> 2018-10-22T13:20:41.248+0200 DEBUG [ScheduledFastTasks:1]
> NettyFactory.java:321 creating outbound bootstrap to peer
> 10.216.193.243:12700, compression: true, encryption: enabled (jdk),
> coalesce: DISABLED, protocolV
> ersion: 11
>
> It looks like the 4.0 node doesn’t accept incoming messages from the old
> nodes on the ssl_storage_port and continues to try to contact the old
> nodes on the storage_port that are disabled in the old nodes. I don't
> really know what to make of this, any ideas or suggestions?
>
> Regards,
> Tommy
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>