You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Steven Lowenthal (JIRA)" <ji...@apache.org> on 2015/09/14 06:47:46 UTC

[jira] [Commented] (CASSANDRA-8072) Exception during startup: Unable to gossip with any seeds

    [ https://issues.apache.org/jira/browse/CASSANDRA-8072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14742914#comment-14742914 ] 

Steven Lowenthal commented on CASSANDRA-8072:
---------------------------------------------

I can easily reproduce this with my automated launcher  I even tried to better randomize when non-seed nodes come in to join.   I recently noticed that the seed node has a socket stuck in CLOSE_WAIT for the nodes that report can't gossip with any seeds.  Perhaps the solution lies in ensuring that both ends of the connection properly close the connection.  It's likely the client (the node asking to join) exceptions out and dies without elegantly closing the connection.   

> Exception during startup: Unable to gossip with any seeds
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-8072
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8072
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Ryan Springer
>            Assignee: Brandon Williams
>             Fix For: 2.1.x
>
>         Attachments: cas-dev-dt-01-uw1-cassandra-seed01_logs.tar.bz2, cas-dev-dt-01-uw1-cassandra-seed02_logs.tar.bz2, cas-dev-dt-01-uw1-cassandra02_logs.tar.bz2, casandra-system-log-with-assert-patch.log, trace_logs.tar.bz2
>
>
> When Opscenter 4.1.4 or 5.0.1 tries to provision a 2-node DSC 2.0.10 cluster in either ec2 or locally, an error occurs sometimes with one of the nodes refusing to start C*.  The error in the /var/log/cassandra/system.log is:
> ERROR [main] 2014-10-06 15:54:52,292 CassandraDaemon.java (line 513) Exception encountered during startup
> java.lang.RuntimeException: Unable to gossip with any seeds
>         at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1200)
>         at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:444)
>         at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:655)
>         at org.apache.cassandra.service.StorageService.initServer(StorageService.java:609)
>         at org.apache.cassandra.service.StorageService.initServer(StorageService.java:502)
>         at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:378)
>         at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496)
>         at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585)
>  INFO [StorageServiceShutdownHook] 2014-10-06 15:54:52,326 Gossiper.java (line 1279) Announcing shutdown
>  INFO [StorageServiceShutdownHook] 2014-10-06 15:54:54,326 MessagingService.java (line 701) Waiting for messaging service to quiesce
>  INFO [ACCEPT-localhost/127.0.0.1] 2014-10-06 15:54:54,327 MessagingService.java (line 941) MessagingService has terminated the accept() thread
> This errors does not always occur when provisioning a 2-node cluster, but probably around half of the time on only one of the nodes.  I haven't been able to reproduce this error with DSC 2.0.9, and there have been no code or definition file changes in Opscenter.
> I can reproduce locally with the above steps.  I'm happy to test any proposed fixes since I'm the only person able to reproduce reliably so far.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)