You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sharvanath Pathak (JIRA)" <ji...@apache.org> on 2015/08/12 09:20:45 UTC

[jira] [Created] (CASSANDRA-10052) Bring one node down, makes the whole cluster go down for a second

Sharvanath Pathak created CASSANDRA-10052:
---------------------------------------------

             Summary: Bring one node down, makes the whole cluster go down for a second
                 Key: CASSANDRA-10052
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10052
             Project: Cassandra
          Issue Type: Bug
            Reporter: Sharvanath Pathak


When a node goes down, the other nodes learn that through the gossip.

And I do see the log from (Gossiper.java):
```private void markDead(InetAddress addr, EndpointState localState)
   {
       if (logger.isTraceEnabled())
           logger.trace("marking as down {}", addr);
       localState.markDead();
       liveEndpoints.remove(addr);
       unreachableEndpoints.put(addr, System.nanoTime());
       logger.info("InetAddress {} is now DOWN", addr);
       for (IEndpointStateChangeSubscriber subscriber : subscribers)
           subscriber.onDead(addr, localState);
       if (logger.isTraceEnabled())
           logger.trace("Notified " + subscribers);
   }```

Saying: "InetAddress 192.168.101.1 is now Down" in the Cassandra's system log.

Now on all the other nodes the client side (java driver), says " Cannot connect to any host, scheduling retry in 1000 milliseconds". They eventually do reconnect but some queries fail during this intermediate period.

To me it seems like when the server pushes the nodeDown event, it call the getRpcAddress(endpoint), and thus sends localhost as the argument in the nodeDown event.  

As in org.apache.cassandra.transport.Server.java
```public void onDown(InetAddress endpoint)
       {      server.connectionTracker.send(Event.StatusChange.nodeDown(getRpcAddress(endpoint), server.socket.getPort()));
       }```
the getRpcAddress returns localhost for any endpoint if the cassandra.yaml is using localhost as the configuration for rpc_address (which by the way is the default).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)