You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cassandra.apache.org by Tommy Stendahl <to...@ericsson.com> on 2018/10/23 09:44:03 UTC
Testing 4.0 upgrading
Hi,
I have been testing upgrade to 4.0, I started out with a cluster with
3.0.15 and server encryption enabled. Due to some issues in my
environment I did upgrade one of the nodes to 3.11.3, I think this
turned out to be a good thing since I could observer the behaviour of
upgrading from both 3.0.15 and 3.11.3 at the same time.
At first I didn't have much success at all, it look like found multiple
issues mostly with server encryption so I decided to simplify thing and
disabled server encryption.
So with server encryption disabled the upgrade was working ok, what I
did notice was exceptions in the 3.0.15 and 3.11.3 nodes once the first
4.0 node started.
3.0.15 exception:
2018-10-22T11:05:38.883+0200 ERROR
[MessagingService-Incoming-/10.216.193.244] CassandraDaemon.java:223
Exception in thread Thread[MessagingService-Incoming-/10.216.193.244,5,main]
java.lang.RuntimeException: Unknown column coordinator_port during
deserialization
at
org.apache.cassandra.db.Columns$Serializer.deserialize(Columns.java:433)
~[apache-cassandra-3.0.15.jar:3.0.15]
at
org.apache.cassandra.db.filter.ColumnFilter$Serializer.deserialize(ColumnFilter.java:447)
~[apache-cassandra-3.0.15.jar:3.0.15]
at
org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:647)
~[apache-cassandra-3.0.15.jar:3.0.15]
at
org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:584)
~[apache-cassandra-3.0.15.jar:3.0.15]
at
org.apache.cassandra.io.ForwardingVersionedSerializer.deserialize(ForwardingVersionedSerializer.java:50)
~[apache-cassandra-3.0.15.jar:3.0.15]
at org.apache.cassandra.net.MessageIn.read(MessageIn.java:98)
~[apache-cassandra-3.0.15.jar:3.0.15]
at
org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:201)
~[apache-cassandra-3.0.15.jar:3.0.15]
at
org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:178)
~[apache-cassandra-3.0.15.jar:3.0.15]
at
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:92)
~[apache-cassandra-3.0.15.jar:3.0.15]
3.11.3 exception:
2018-10-22T11:12:05.060+0200 ERROR
[MessagingService-Incoming-/10.216.193.244] CassandraDaemon.java:228
Exception in thread Thread[MessagingService-Incoming-/10.216.193.244,5,main]
java.lang.RuntimeException: Unknown column coordinator_port during
deserialization
at
org.apache.cassandra.db.Columns$Serializer.deserialize(Columns.java:452)
~[apache-cassandra-3.11.3.jar:3.11.3]
at
org.apache.cassandra.db.filter.ColumnFilter$Serializer.deserialize(ColumnFilter.java:482)
~[apache-cassandra-3.11.3.jar:3.11.3]
at
org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:760)
~[apache-cassandra-3.11.3.jar:3.11.3]
at
org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:697)
~[apache-cassandra-3.11.3.jar:3.11.3]
at
org.apache.cassandra.io.ForwardingVersionedSerializer.deserialize(ForwardingVersionedSerializer.java:50)
~[apache-cassandra-3.11.3.jar:3.11.3]
at org.apache.cassandra.net.MessageIn.read(MessageIn.java:123)
~[apache-cassandra-3.11.3.jar:3.11.3]
at
org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:192)
~[apache-cassandra-3.11.3.jar:3.11.3]
at
org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:180)
~[apache-cassandra-3.11.3.jar:3.11.3]
at
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:94)
~[apache-cassandra-3.11.3.jar:3.11.3]
I got one or two of these exceptions every minute, I could not see that
this was causing any problems but I don't really like exceptions in the
log. Should we expect exceptions like this during an upgrade or is this
a problem we should look in to?
So now I enabled server encryption again, with this configuration:
storage_port: 12700
ssl_storage_port: 12701
server_encryption_options:
# set to true for allowing secure incoming connections
enabled: true
# If enabled and optional are both set to true, encrypted and
unencrypted connections are handled on the storage_port
optional: false
# if enabled, will open up an encrypted listening socket on
ssl_storage_port. Should be used
# during upgrade to 4.0; otherwise, set to false.
enable_legacy_ssl_storage_port: true
# on outbound connections, determine which type of peers to
securely connect to. 'enabled' must be set to true.
internode_encryption: all
keystore: /usr/share/cassandra/.ssl/cil-intern/server/keystore.jks
keystore_password: '*********'
truststore: /usr/share/cassandra/.ssl/cil-intern/server/truststore.jks
truststore_password: '**********'
# More advanced defaults below:
protocol: TLSv1.2
# store_type: JKS
cipher_suites:
[TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_RSA_WITH_AES_128_CBC_SHA]
# require_client_auth: false
# require_endpoint_verification: false
As far as I understand this configuration is correct but the 4.0 node
can't contact the old nodes. I have tried with the default protocol
setting also but it makes no difference. The behaviour is different in
3.0.15 and 3.11.3, in 3.0.15 I get this exception:
2018-10-22T11:57:44.366+0200 ERROR
[MessagingService-NettyInbound-Thread-3-8]
InboundHandshakeHandler.java:300 Failed to properly handshake with peer
/10.216.193.246:58296. Closing the channel.
io.netty.handler.codec.DecoderException:
javax.net.ssl.SSLHandshakeException: SSLv2Hello is disabled
at
io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:459)
at
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965)
at
io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:808)
at
io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:417)
at
io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:317)
at
io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884)
at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
Caused by: javax.net.ssl.SSLHandshakeException: SSLv2Hello is disabled
at
sun.security.ssl.InputRecord.handleUnknownRecord(InputRecord.java:637)
at sun.security.ssl.InputRecord.read(InputRecord.java:527)
at
sun.security.ssl.EngineInputRecord.read(EngineInputRecord.java:382)
at
sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:962)
at
sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:907)
at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:781)
at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624)
at
io.netty.handler.ssl.SslHandler$SslEngineType$3.unwrap(SslHandler.java:294)
at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1275)
at
io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1177)
at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1221)
at
io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489)
at
io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428)
... 14 common frames omitted
and in the 4.0 debug log I see this related to the 3.0.15 node:
2018-10-22T13:25:34.414+0200 DEBUG [ScheduledFastTasks:1]
OutboundMessagingConnection.java:259 connection attempt 788 to
10.216.193.246:12700 (GOSSIP)
2018-10-22T13:25:34.414+0200 DEBUG [ScheduledFastTasks:1]
NettyFactory.java:321 creating outbound bootstrap to peer
10.216.193.246:12700, compression: true, encryption: enabled (jdk),
coalesce: DISABLED, protocolV
ersion: 12
2018-10-22T13:25:34.526+0200 DEBUG [GossipTasks:1]
OutboundMessagingConnection.java:259 connection attempt 789 to
10.216.193.246:12700 (GOSSIP)
2018-10-22T13:25:34.527+0200 DEBUG [GossipTasks:1] NettyFactory.java:321
creating outbound bootstrap to peer 10.216.193.246:12700, compression:
true, encryption: enabled (jdk), coalesce: DISABLED, protocolVersion:
12
In the 3.11.3 node I don't see any errors, I do see this:
2018-10-22T13:20:36.270+0200 INFO [HANDSHAKE-/10.216.193.244]
OutboundTcpConnection.java:561 Handshaking version with /10.216.193.244
and in the 4.0 node I find this:
2018-10-22T13:20:36.270+0200 INFO
[MessagingService-NettyInbound-Thread-3-1]
InboundHandshakeHandler.java:146 connection from peer
/10.216.193.243:54262, protocol = TLSv1.2, cipher suite =
TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
2018-10-22T13:20:41.147+0200 DEBUG [main]
OutboundMessagingConnection.java:259 connection attempt 0 to
10.216.193.243:12700 (GOSSIP)
2018-10-22T13:20:41.147+0200 DEBUG [main] NettyFactory.java:321 creating
outbound bootstrap to peer 10.216.193.243:12700, compression: true,
encryption: enabled (jdk), coalesce: DISABLED, protocolVersion: 11
2018-10-22T13:20:41.248+0200 DEBUG [ScheduledFastTasks:1]
OutboundMessagingConnection.java:259 connection attempt 1 to
10.216.193.243:12700 (GOSSIP)
2018-10-22T13:20:41.248+0200 DEBUG [ScheduledFastTasks:1]
NettyFactory.java:321 creating outbound bootstrap to peer
10.216.193.243:12700, compression: true, encryption: enabled (jdk),
coalesce: DISABLED, protocolV
ersion: 11
It looks like the 4.0 node doesn’t accept incoming messages from the old
nodes on the ssl_storage_port and continues to try to contact the old
nodes on the storage_port that are disabled in the old nodes. I don't
really know what to make of this, any ideas or suggestions?
Regards,
Tommy
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org
Re: Testing 4.0 upgrading
Posted by Nate McCall <zz...@gmail.com>.
> Now I will reinstall the old version and do the upgrade again and
> hopefully I can have more information soon.
>
Hi Tommy,
Thanks for taking the time to brave an upgrade test this early on - it
is super helpful to get this feedback.
Anyone else that has bandwidth, we very much appreciate these types of
tests, so please keep them coming.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org
Re: Testing 4.0 upgrading
Posted by Tommy Stendahl <to...@ericsson.com>.
Hi Jason,
Thanks for responding.
I have created two jiras for these things: CASSANDRA-14841 and
CASSANDRA-14842.
I understand the initial 4.0->3.x connection problem, since the incoming
connection from the old node isn't accepted the 4.0 node will never know
that it tries to connect to a old node and continuous to try to connect
on the storage_port instead on the ssl_storage_port.
I will enable SSL debugging and activate wire tracing and see if I can
find out more.
I can also add that I have upgraded all nodes in the cluster to 4.0 and
once a node was upgraded to 4.0 it started talking to the other 4.0 node
as expected. I had data in the cluster from before the upgrade and after
doing "nodetool upgradesstables" there where no issues restarting the
clients.
Now I will reinstall the old version and do the upgrade again and
hopefully I can have more information soon.
/Tommy
On 2018-10-23 19:12, Jason Brown wrote:
> Hi Tommy,
>
> Thank you for taking the time to start kicking the tires on the upgrade to
> 4.0. It looks like you've found two bugs:
>
> 1) "Unknown column coordinator_port during deserialization" (reported on
> 3.x nodes)
>
> - looks like the 4.0 node isn't filtering out a column from a system table
> that 3.0 doesn't know about. Most likely due to CASSANDRA-7544. Can you
> open a JIRA for this, and tag @aweisberg?
>
> 2) SSL connection problems
>
> I unserstand the 4.0 -> 3.X connection problem, and documented it at [1] in
> MessagingService. TL;DR we don't know the version of a peer when restarting
> and need to wait for that peer to connect to the local node and pass it's
> correct messaging version (if the local node cannot interpret the handhsake
> from the peer). However, why for the inbound connection to the 4.0 node it
> is seeing SSLv2 is unclear. Can you open a separate JIRA, and we'll go from
> there? In the meantime, maybe enable the JDK's SSL debugging [2] on the 3.x
> node to see exactly what it is trying to do? Also, you can enable wire
> tracing on the 4.0 node by setting this value to true [3] and recompiling.
> We can followup further in the jira.
>
> Thanks!
>
> -Jason
>
> [1]
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/MessagingService.java#L1668
> [2]
> https://docs.oracle.com/javase/8/docs/technotes/guides/security/jsse/ReadDebug.html
> [3]
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/async/NettyFactory.java#L84
>
> On Tue, Oct 23, 2018 at 2:44 AM Tommy Stendahl <to...@ericsson.com>
> wrote:
>
>> Hi,
>>
>> I have been testing upgrade to 4.0, I started out with a cluster with
>> 3.0.15 and server encryption enabled. Due to some issues in my
>> environment I did upgrade one of the nodes to 3.11.3, I think this
>> turned out to be a good thing since I could observer the behaviour of
>> upgrading from both 3.0.15 and 3.11.3 at the same time.
>>
>> At first I didn't have much success at all, it look like found multiple
>> issues mostly with server encryption so I decided to simplify thing and
>> disabled server encryption.
>>
>> So with server encryption disabled the upgrade was working ok, what I
>> did notice was exceptions in the 3.0.15 and 3.11.3 nodes once the first
>> 4.0 node started.
>>
>> 3.0.15 exception:
>> 2018-10-22T11:05:38.883+0200 ERROR
>> [MessagingService-Incoming-/10.216.193.244] CassandraDaemon.java:223
>> Exception in thread Thread[MessagingService-Incoming-/10.216.193.244
>> ,5,main]
>> java.lang.RuntimeException: Unknown column coordinator_port during
>> deserialization
>> at
>> org.apache.cassandra.db.Columns$Serializer.deserialize(Columns.java:433)
>> ~[apache-cassandra-3.0.15.jar:3.0.15]
>> at
>> org.apache.cassandra.db.filter.ColumnFilter$Serializer.deserialize(ColumnFilter.java:447)
>>
>> ~[apache-cassandra-3.0.15.jar:3.0.15]
>> at
>> org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:647)
>>
>> ~[apache-cassandra-3.0.15.jar:3.0.15]
>> at
>> org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:584)
>>
>> ~[apache-cassandra-3.0.15.jar:3.0.15]
>> at
>> org.apache.cassandra.io.ForwardingVersionedSerializer.deserialize(ForwardingVersionedSerializer.java:50)
>>
>> ~[apache-cassandra-3.0.15.jar:3.0.15]
>> at org.apache.cassandra.net.MessageIn.read(MessageIn.java:98)
>> ~[apache-cassandra-3.0.15.jar:3.0.15]
>> at
>> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:201)
>>
>> ~[apache-cassandra-3.0.15.jar:3.0.15]
>> at
>> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:178)
>>
>> ~[apache-cassandra-3.0.15.jar:3.0.15]
>> at
>> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:92)
>>
>> ~[apache-cassandra-3.0.15.jar:3.0.15]
>>
>> 3.11.3 exception:
>> 2018-10-22T11:12:05.060+0200 ERROR
>> [MessagingService-Incoming-/10.216.193.244] CassandraDaemon.java:228
>> Exception in thread Thread[MessagingService-Incoming-/10.216.193.244
>> ,5,main]
>> java.lang.RuntimeException: Unknown column coordinator_port during
>> deserialization
>> at
>> org.apache.cassandra.db.Columns$Serializer.deserialize(Columns.java:452)
>> ~[apache-cassandra-3.11.3.jar:3.11.3]
>> at
>> org.apache.cassandra.db.filter.ColumnFilter$Serializer.deserialize(ColumnFilter.java:482)
>>
>> ~[apache-cassandra-3.11.3.jar:3.11.3]
>> at
>> org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:760)
>>
>> ~[apache-cassandra-3.11.3.jar:3.11.3]
>> at
>> org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:697)
>>
>> ~[apache-cassandra-3.11.3.jar:3.11.3]
>> at
>> org.apache.cassandra.io.ForwardingVersionedSerializer.deserialize(ForwardingVersionedSerializer.java:50)
>>
>> ~[apache-cassandra-3.11.3.jar:3.11.3]
>> at org.apache.cassandra.net.MessageIn.read(MessageIn.java:123)
>> ~[apache-cassandra-3.11.3.jar:3.11.3]
>> at
>> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:192)
>>
>> ~[apache-cassandra-3.11.3.jar:3.11.3]
>> at
>> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:180)
>>
>> ~[apache-cassandra-3.11.3.jar:3.11.3]
>> at
>> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:94)
>>
>> ~[apache-cassandra-3.11.3.jar:3.11.3]
>>
>> I got one or two of these exceptions every minute, I could not see that
>> this was causing any problems but I don't really like exceptions in the
>> log. Should we expect exceptions like this during an upgrade or is this
>> a problem we should look in to?
>>
>> So now I enabled server encryption again, with this configuration:
>> storage_port: 12700
>> ssl_storage_port: 12701
>> server_encryption_options:
>> # set to true for allowing secure incoming connections
>> enabled: true
>> # If enabled and optional are both set to true, encrypted and
>> unencrypted connections are handled on the storage_port
>> optional: false
>> # if enabled, will open up an encrypted listening socket on
>> ssl_storage_port. Should be used
>> # during upgrade to 4.0; otherwise, set to false.
>> enable_legacy_ssl_storage_port: true
>> # on outbound connections, determine which type of peers to
>> securely connect to. 'enabled' must be set to true.
>> internode_encryption: all
>> keystore: /usr/share/cassandra/.ssl/cil-intern/server/keystore.jks
>> keystore_password: '*********'
>> truststore: /usr/share/cassandra/.ssl/cil-intern/server/truststore.jks
>> truststore_password: '**********'
>> # More advanced defaults below:
>> protocol: TLSv1.2
>> # store_type: JKS
>> cipher_suites:
>>
>> [TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_RSA_WITH_AES_128_CBC_SHA]
>> # require_client_auth: false
>> # require_endpoint_verification: false
>>
>> As far as I understand this configuration is correct but the 4.0 node
>> can't contact the old nodes. I have tried with the default protocol
>> setting also but it makes no difference. The behaviour is different in
>> 3.0.15 and 3.11.3, in 3.0.15 I get this exception:
>> 2018-10-22T11:57:44.366+0200 ERROR
>> [MessagingService-NettyInbound-Thread-3-8]
>> InboundHandshakeHandler.java:300 Failed to properly handshake with peer
>> /10.216.193.246:58296. Closing the channel.
>> io.netty.handler.codec.DecoderException:
>> javax.net.ssl.SSLHandshakeException: SSLv2Hello is disabled
>> at
>>
>> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:459)
>> at
>>
>> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265)
>> at
>>
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
>> at
>>
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
>> at
>>
>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
>> at
>>
>> io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434)
>> at
>>
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
>> at
>>
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
>> at
>>
>> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965)
>> at
>>
>> io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:808)
>> at
>> io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:417)
>> at
>> io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:317)
>> at
>>
>> io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884)
>> at
>>
>> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>> at java.lang.Thread.run(Thread.java:748)
>> Caused by: javax.net.ssl.SSLHandshakeException: SSLv2Hello is disabled
>> at
>> sun.security.ssl.InputRecord.handleUnknownRecord(InputRecord.java:637)
>> at sun.security.ssl.InputRecord.read(InputRecord.java:527)
>> at
>> sun.security.ssl.EngineInputRecord.read(EngineInputRecord.java:382)
>> at
>> sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:962)
>> at
>> sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:907)
>> at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:781)
>> at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624)
>> at
>> io.netty.handler.ssl.SslHandler$SslEngineType$3.unwrap(SslHandler.java:294)
>> at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1275)
>> at
>> io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1177)
>> at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1221)
>> at
>>
>> io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489)
>> at
>>
>> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428)
>> ... 14 common frames omitted
>>
>> and in the 4.0 debug log I see this related to the 3.0.15 node:
>> 2018-10-22T13:25:34.414+0200 DEBUG [ScheduledFastTasks:1]
>> OutboundMessagingConnection.java:259 connection attempt 788 to
>> 10.216.193.246:12700 (GOSSIP)
>> 2018-10-22T13:25:34.414+0200 DEBUG [ScheduledFastTasks:1]
>> NettyFactory.java:321 creating outbound bootstrap to peer
>> 10.216.193.246:12700, compression: true, encryption: enabled (jdk),
>> coalesce: DISABLED, protocolV
>> ersion: 12
>> 2018-10-22T13:25:34.526+0200 DEBUG [GossipTasks:1]
>> OutboundMessagingConnection.java:259 connection attempt 789 to
>> 10.216.193.246:12700 (GOSSIP)
>> 2018-10-22T13:25:34.527+0200 DEBUG [GossipTasks:1] NettyFactory.java:321
>> creating outbound bootstrap to peer 10.216.193.246:12700, compression:
>> true, encryption: enabled (jdk), coalesce: DISABLED, protocolVersion:
>> 12
>>
>> In the 3.11.3 node I don't see any errors, I do see this:
>> 2018-10-22T13:20:36.270+0200 INFO [HANDSHAKE-/10.216.193.244]
>> OutboundTcpConnection.java:561 Handshaking version with /10.216.193.244
>>
>> and in the 4.0 node I find this:
>> 2018-10-22T13:20:36.270+0200 INFO
>> [MessagingService-NettyInbound-Thread-3-1]
>> InboundHandshakeHandler.java:146 connection from peer
>> /10.216.193.243:54262, protocol = TLSv1.2, cipher suite =
>> TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
>> 2018-10-22T13:20:41.147+0200 DEBUG [main]
>> OutboundMessagingConnection.java:259 connection attempt 0 to
>> 10.216.193.243:12700 (GOSSIP)
>> 2018-10-22T13:20:41.147+0200 DEBUG [main] NettyFactory.java:321 creating
>> outbound bootstrap to peer 10.216.193.243:12700, compression: true,
>> encryption: enabled (jdk), coalesce: DISABLED, protocolVersion: 11
>> 2018-10-22T13:20:41.248+0200 DEBUG [ScheduledFastTasks:1]
>> OutboundMessagingConnection.java:259 connection attempt 1 to
>> 10.216.193.243:12700 (GOSSIP)
>> 2018-10-22T13:20:41.248+0200 DEBUG [ScheduledFastTasks:1]
>> NettyFactory.java:321 creating outbound bootstrap to peer
>> 10.216.193.243:12700, compression: true, encryption: enabled (jdk),
>> coalesce: DISABLED, protocolV
>> ersion: 11
>>
>> It looks like the 4.0 node doesn’t accept incoming messages from the old
>> nodes on the ssl_storage_port and continues to try to contact the old
>> nodes on the storage_port that are disabled in the old nodes. I don't
>> really know what to make of this, any ideas or suggestions?
>>
>> Regards,
>> Tommy
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>
>>
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org
Re: Testing 4.0 upgrading
Posted by Jason Brown <ja...@gmail.com>.
Hi Tommy,
Thank you for taking the time to start kicking the tires on the upgrade to
4.0. It looks like you've found two bugs:
1) "Unknown column coordinator_port during deserialization" (reported on
3.x nodes)
- looks like the 4.0 node isn't filtering out a column from a system table
that 3.0 doesn't know about. Most likely due to CASSANDRA-7544. Can you
open a JIRA for this, and tag @aweisberg?
2) SSL connection problems
I unserstand the 4.0 -> 3.X connection problem, and documented it at [1] in
MessagingService. TL;DR we don't know the version of a peer when restarting
and need to wait for that peer to connect to the local node and pass it's
correct messaging version (if the local node cannot interpret the handhsake
from the peer). However, why for the inbound connection to the 4.0 node it
is seeing SSLv2 is unclear. Can you open a separate JIRA, and we'll go from
there? In the meantime, maybe enable the JDK's SSL debugging [2] on the 3.x
node to see exactly what it is trying to do? Also, you can enable wire
tracing on the 4.0 node by setting this value to true [3] and recompiling.
We can followup further in the jira.
Thanks!
-Jason
[1]
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/MessagingService.java#L1668
[2]
https://docs.oracle.com/javase/8/docs/technotes/guides/security/jsse/ReadDebug.html
[3]
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/async/NettyFactory.java#L84
On Tue, Oct 23, 2018 at 2:44 AM Tommy Stendahl <to...@ericsson.com>
wrote:
> Hi,
>
> I have been testing upgrade to 4.0, I started out with a cluster with
> 3.0.15 and server encryption enabled. Due to some issues in my
> environment I did upgrade one of the nodes to 3.11.3, I think this
> turned out to be a good thing since I could observer the behaviour of
> upgrading from both 3.0.15 and 3.11.3 at the same time.
>
> At first I didn't have much success at all, it look like found multiple
> issues mostly with server encryption so I decided to simplify thing and
> disabled server encryption.
>
> So with server encryption disabled the upgrade was working ok, what I
> did notice was exceptions in the 3.0.15 and 3.11.3 nodes once the first
> 4.0 node started.
>
> 3.0.15 exception:
> 2018-10-22T11:05:38.883+0200 ERROR
> [MessagingService-Incoming-/10.216.193.244] CassandraDaemon.java:223
> Exception in thread Thread[MessagingService-Incoming-/10.216.193.244
> ,5,main]
> java.lang.RuntimeException: Unknown column coordinator_port during
> deserialization
> at
> org.apache.cassandra.db.Columns$Serializer.deserialize(Columns.java:433)
> ~[apache-cassandra-3.0.15.jar:3.0.15]
> at
> org.apache.cassandra.db.filter.ColumnFilter$Serializer.deserialize(ColumnFilter.java:447)
>
> ~[apache-cassandra-3.0.15.jar:3.0.15]
> at
> org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:647)
>
> ~[apache-cassandra-3.0.15.jar:3.0.15]
> at
> org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:584)
>
> ~[apache-cassandra-3.0.15.jar:3.0.15]
> at
> org.apache.cassandra.io.ForwardingVersionedSerializer.deserialize(ForwardingVersionedSerializer.java:50)
>
> ~[apache-cassandra-3.0.15.jar:3.0.15]
> at org.apache.cassandra.net.MessageIn.read(MessageIn.java:98)
> ~[apache-cassandra-3.0.15.jar:3.0.15]
> at
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:201)
>
> ~[apache-cassandra-3.0.15.jar:3.0.15]
> at
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:178)
>
> ~[apache-cassandra-3.0.15.jar:3.0.15]
> at
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:92)
>
> ~[apache-cassandra-3.0.15.jar:3.0.15]
>
> 3.11.3 exception:
> 2018-10-22T11:12:05.060+0200 ERROR
> [MessagingService-Incoming-/10.216.193.244] CassandraDaemon.java:228
> Exception in thread Thread[MessagingService-Incoming-/10.216.193.244
> ,5,main]
> java.lang.RuntimeException: Unknown column coordinator_port during
> deserialization
> at
> org.apache.cassandra.db.Columns$Serializer.deserialize(Columns.java:452)
> ~[apache-cassandra-3.11.3.jar:3.11.3]
> at
> org.apache.cassandra.db.filter.ColumnFilter$Serializer.deserialize(ColumnFilter.java:482)
>
> ~[apache-cassandra-3.11.3.jar:3.11.3]
> at
> org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:760)
>
> ~[apache-cassandra-3.11.3.jar:3.11.3]
> at
> org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:697)
>
> ~[apache-cassandra-3.11.3.jar:3.11.3]
> at
> org.apache.cassandra.io.ForwardingVersionedSerializer.deserialize(ForwardingVersionedSerializer.java:50)
>
> ~[apache-cassandra-3.11.3.jar:3.11.3]
> at org.apache.cassandra.net.MessageIn.read(MessageIn.java:123)
> ~[apache-cassandra-3.11.3.jar:3.11.3]
> at
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:192)
>
> ~[apache-cassandra-3.11.3.jar:3.11.3]
> at
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:180)
>
> ~[apache-cassandra-3.11.3.jar:3.11.3]
> at
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:94)
>
> ~[apache-cassandra-3.11.3.jar:3.11.3]
>
> I got one or two of these exceptions every minute, I could not see that
> this was causing any problems but I don't really like exceptions in the
> log. Should we expect exceptions like this during an upgrade or is this
> a problem we should look in to?
>
> So now I enabled server encryption again, with this configuration:
> storage_port: 12700
> ssl_storage_port: 12701
> server_encryption_options:
> # set to true for allowing secure incoming connections
> enabled: true
> # If enabled and optional are both set to true, encrypted and
> unencrypted connections are handled on the storage_port
> optional: false
> # if enabled, will open up an encrypted listening socket on
> ssl_storage_port. Should be used
> # during upgrade to 4.0; otherwise, set to false.
> enable_legacy_ssl_storage_port: true
> # on outbound connections, determine which type of peers to
> securely connect to. 'enabled' must be set to true.
> internode_encryption: all
> keystore: /usr/share/cassandra/.ssl/cil-intern/server/keystore.jks
> keystore_password: '*********'
> truststore: /usr/share/cassandra/.ssl/cil-intern/server/truststore.jks
> truststore_password: '**********'
> # More advanced defaults below:
> protocol: TLSv1.2
> # store_type: JKS
> cipher_suites:
>
> [TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_RSA_WITH_AES_128_CBC_SHA]
> # require_client_auth: false
> # require_endpoint_verification: false
>
> As far as I understand this configuration is correct but the 4.0 node
> can't contact the old nodes. I have tried with the default protocol
> setting also but it makes no difference. The behaviour is different in
> 3.0.15 and 3.11.3, in 3.0.15 I get this exception:
> 2018-10-22T11:57:44.366+0200 ERROR
> [MessagingService-NettyInbound-Thread-3-8]
> InboundHandshakeHandler.java:300 Failed to properly handshake with peer
> /10.216.193.246:58296. Closing the channel.
> io.netty.handler.codec.DecoderException:
> javax.net.ssl.SSLHandshakeException: SSLv2Hello is disabled
> at
>
> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:459)
> at
>
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265)
> at
>
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
> at
>
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
> at
>
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
> at
>
> io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434)
> at
>
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
> at
>
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
> at
>
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965)
> at
>
> io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:808)
> at
> io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:417)
> at
> io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:317)
> at
>
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884)
> at
>
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: javax.net.ssl.SSLHandshakeException: SSLv2Hello is disabled
> at
> sun.security.ssl.InputRecord.handleUnknownRecord(InputRecord.java:637)
> at sun.security.ssl.InputRecord.read(InputRecord.java:527)
> at
> sun.security.ssl.EngineInputRecord.read(EngineInputRecord.java:382)
> at
> sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:962)
> at
> sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:907)
> at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:781)
> at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624)
> at
> io.netty.handler.ssl.SslHandler$SslEngineType$3.unwrap(SslHandler.java:294)
> at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1275)
> at
> io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1177)
> at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1221)
> at
>
> io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489)
> at
>
> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428)
> ... 14 common frames omitted
>
> and in the 4.0 debug log I see this related to the 3.0.15 node:
> 2018-10-22T13:25:34.414+0200 DEBUG [ScheduledFastTasks:1]
> OutboundMessagingConnection.java:259 connection attempt 788 to
> 10.216.193.246:12700 (GOSSIP)
> 2018-10-22T13:25:34.414+0200 DEBUG [ScheduledFastTasks:1]
> NettyFactory.java:321 creating outbound bootstrap to peer
> 10.216.193.246:12700, compression: true, encryption: enabled (jdk),
> coalesce: DISABLED, protocolV
> ersion: 12
> 2018-10-22T13:25:34.526+0200 DEBUG [GossipTasks:1]
> OutboundMessagingConnection.java:259 connection attempt 789 to
> 10.216.193.246:12700 (GOSSIP)
> 2018-10-22T13:25:34.527+0200 DEBUG [GossipTasks:1] NettyFactory.java:321
> creating outbound bootstrap to peer 10.216.193.246:12700, compression:
> true, encryption: enabled (jdk), coalesce: DISABLED, protocolVersion:
> 12
>
> In the 3.11.3 node I don't see any errors, I do see this:
> 2018-10-22T13:20:36.270+0200 INFO [HANDSHAKE-/10.216.193.244]
> OutboundTcpConnection.java:561 Handshaking version with /10.216.193.244
>
> and in the 4.0 node I find this:
> 2018-10-22T13:20:36.270+0200 INFO
> [MessagingService-NettyInbound-Thread-3-1]
> InboundHandshakeHandler.java:146 connection from peer
> /10.216.193.243:54262, protocol = TLSv1.2, cipher suite =
> TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
> 2018-10-22T13:20:41.147+0200 DEBUG [main]
> OutboundMessagingConnection.java:259 connection attempt 0 to
> 10.216.193.243:12700 (GOSSIP)
> 2018-10-22T13:20:41.147+0200 DEBUG [main] NettyFactory.java:321 creating
> outbound bootstrap to peer 10.216.193.243:12700, compression: true,
> encryption: enabled (jdk), coalesce: DISABLED, protocolVersion: 11
> 2018-10-22T13:20:41.248+0200 DEBUG [ScheduledFastTasks:1]
> OutboundMessagingConnection.java:259 connection attempt 1 to
> 10.216.193.243:12700 (GOSSIP)
> 2018-10-22T13:20:41.248+0200 DEBUG [ScheduledFastTasks:1]
> NettyFactory.java:321 creating outbound bootstrap to peer
> 10.216.193.243:12700, compression: true, encryption: enabled (jdk),
> coalesce: DISABLED, protocolV
> ersion: 11
>
> It looks like the 4.0 node doesn’t accept incoming messages from the old
> nodes on the ssl_storage_port and continues to try to contact the old
> nodes on the storage_port that are disabled in the old nodes. I don't
> really know what to make of this, any ideas or suggestions?
>
> Regards,
> Tommy
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>