You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Oliver Seiler (JIRA)" <ji...@apache.org> on 2013/12/09 17:42:08 UTC

[jira] [Commented] (CASSANDRA-6349) IOException in MessagingService.run() causes orphaned storage server socket

    [ https://issues.apache.org/jira/browse/CASSANDRA-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843308#comment-13843308 ] 

Oliver Seiler commented on CASSANDRA-6349:
------------------------------------------

I suspect these changes introduced an infinite loop if the ServerSocket gets closed (not sure how that is happening though). We've been seeing some major problems with Cassandra 2.0.3 when a new cluster is coming up for the first time, and it seems to be a result of this. With logging set to debug, system.log is getting pummelled with these exception messages:

{noformat}
DEBUG [ACCEPT-localhost-grid/10.96.99.178] 2013-12-06 22:55:39,759 MessagingService.java (line 905) Error reading the socket null
java.net.SocketException: Socket closed
        at java.net.PlainSocketImpl.socketAccept(Native Method)
        at java.net.AbstractPlainSocketImpl.accept(Unknown Source)
        at java.net.ServerSocket.implAccept(Unknown Source)
        at sun.security.ssl.SSLServerSocketImpl.accept(Unknown Source)
        at org.apache.cassandra.net.MessagingService$SocketThread.run(MessagingService.java:865)
{noformat}

It looks like once in this state, nothing will break it out; prior to this change the IOException catch block was throwing another exception, now it just keeps looping, using the (seemingly closed) ServerSocket. Restarting Cassandra seems to be the only way to resolve this. I'll probably be recommending we drop back to 2.0.2 until this problem is fixed (or we can understand why the ServerSocket is closed...)


> IOException in MessagingService.run() causes orphaned storage server socket
> ---------------------------------------------------------------------------
>
>                 Key: CASSANDRA-6349
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6349
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: cassandra 2.0+
>            Reporter: Steven Halaka
>            Assignee: Mikhail Stepura
>             Fix For: 2.0.3
>
>         Attachments: CASSANDRA-2.0-6349.patch
>
>
> The refactoring of reading the message header in MessagingService.run() vs IncomingTcpConnection seems to mishandle IOException as the loop is broken and MessagingService.SocketThread never seems to get reinitialized.
> To reproduce: telnet to port 7000 and send random data. This then prevents any new or restarting node in the cluster from handshaking with this defunct storage port.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)