You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sylvain Lebresne (JIRA)" <ji...@apache.org> on 2017/03/13 08:43:04 UTC

[jira] [Commented] (CASSANDRA-13323) IncomingTcpConnection closed due to one bad message

    [ https://issues.apache.org/jira/browse/CASSANDRA-13323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15907017#comment-15907017 ] 

Sylvain Lebresne commented on CASSANDRA-13323:
----------------------------------------------

Pretty sure this patch is not going to work. When you get the {{UnknownColumnFamilyException}}, only a sub-part of the message has been deserialized, so trying to deserialize further message on that connection is going to get (what looks like) garbage. This is, in fact, why we currently just throw out the connection, it's the simplest safest thing to do.

This doesn't mean btw that we couldn't have way to resume on failed message (at lest when we know the failure is not due to a corrupted stream like in this particular case), but it's a bit more involved. The simplest somewhat-generic solution I see fwiv would be to wrap the DataInput into one that counts how many bytes are deserialized. We'd reset the counter at the beginning of each payload and on an exception, we'd know how many bytes we have to skip to resume reading to the next message properly.

> IncomingTcpConnection closed due to one bad message
> ---------------------------------------------------
>
>                 Key: CASSANDRA-13323
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13323
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Simon Zhou
>            Assignee: Simon Zhou
>             Fix For: 3.0.13
>
>         Attachments: CASSANDRA-13323-v1.patch
>
>
> We got this exception:
> {code}
> WARN  [MessagingService-Incoming-/****] 2017-02-14 17:33:33,177 IncomingTcpConnection.java:101 - UnknownColumnFamilyException reading from socket; closing
> org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find table for cfId 2a3ab630-df74-11e6-9f81-b56251e1559e. If a table was just created, this is likely due to the schema not being fully propagated.  Please wait for schema agreement on table creation.
>     at org.apache.cassandra.config.CFMetaData$Serializer.deserialize(CFMetaData.java:1336) ~[apache-cassandra-3.0.10.jar:3.0.10]
>     at org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize30(PartitionUpdate.java:660) ~[apache-cassandra-3.0.10.jar:3.0.10]
>     at org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize(PartitionUpdate.java:635) ~[apache-cassandra-3.0.10.jar:3.0.10]
>     at org.apache.cassandra.service.paxos.Commit$CommitSerializer.deserialize(Commit.java:131) ~[apache-cassandra-3.0.10.jar:3.0.10]
>     at org.apache.cassandra.service.paxos.Commit$CommitSerializer.deserialize(Commit.java:113) ~[apache-cassandra-3.0.10.jar:3.0.10]
>     at org.apache.cassandra.net.MessageIn.read(MessageIn.java:98) ~[apache-cassandra-3.0.10.jar:3.0.10]
>     at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:201) ~[apache-cassandra-3.0.10.jar:3.0.10]
>     at org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:178) ~[apache-cassandra-3.0.10.jar:3.0.10]
>     at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:92) ~[apache-cassandra-3.0.10.jar:3.0.10]
> {code}
> Also we saw this log in another host indicating it needs to re-connect:
> {code}
> INFO  [HANDSHAKE-/****] 2017-02-21 13:37:50,216 OutboundTcpConnection.java:515 - Handshaking version with /****
> {code}
> The reason is that the node was receiving hinted data for a dropped table. This may happen with other messages as well. On Cassandra side, IncomingTcpConnection shouldn't close on just one bad message, even though it will be restarted soon later by SocketThread in MessagingService.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)