You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Paulo Motta (JIRA)" <ji...@apache.org> on 2017/06/17 00:14:00 UTC

[jira] [Commented] (CASSANDRA-13608) Connection closed/reopened during join causes Cassandra stream to close

    [ https://issues.apache.org/jira/browse/CASSANDRA-13608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16052557#comment-16052557 ] 

Paulo Motta commented on CASSANDRA-13608:
-----------------------------------------

Can you attach the debug.log from the source and destination nodes?

> Connection closed/reopened during join causes Cassandra stream to close
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-13608
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13608
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Streaming and Messaging
>         Environment: Cassandra 3.10. Windows Server 2016, 32GB ram, 2TB hard disk, RAID10 with 4 spindles, 8 Cores
>            Reporter: Tania S Engel
>             Fix For: 3.10
>
>         Attachments: Cassandra 3.10 Join with lots GC collection leads to socket closure and join hang.mht
>
>
> We start a JOIN bootstrap. Primary seed node streams to the replica. The replica requires some GC cleanup and experiences frequent pauses including a 12 second old gen cleanup following a memTable flush. Both replica and primary show _MessagingService IOException: An existing connection was forcibly closed by the remote host_. The replica MessagingService-Outgoing reestablishes the connection immediately but the primary StreamKeepAliveExecutor throws a _java.RuntimeException: Outgoing stream handler has been closed_. From that point forward, the replica stays in JOIN mode, sending keeping alive to the primary. The primary receives the keep alive, but does not send its own and it repeatedly fails to send a hints file to the replica. It seems this limping condition would continue indefinitely, but stops as we stop the replica Cassandra. If we restart the replica Cassandra the JOIN picks up again but fails with _java.io.IOException: Corrupt value length 355151036 encountered, as it exceeds the maximum of 268435456, which is set via max_value_size_in_mb in cassandra.yaml_. We have not increased this value as we do not have values that large in our data so we presume it is indeed corrupt and moving past it would not be a good idea. Please see the attachment for details.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org