You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jonathan Ellis (JIRA)" <ji...@apache.org> on 2015/09/01 14:37:46 UTC
[jira] [Updated] (CASSANDRA-10052) Bringing one node down, makes
the whole cluster go down for a second
[ https://issues.apache.org/jira/browse/CASSANDRA-10052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Ellis updated CASSANDRA-10052:
---------------------------------------
Priority: Major (was: Critical)
Fix Version/s: 2.2.x
2.1.x
> Bringing one node down, makes the whole cluster go down for a second
> --------------------------------------------------------------------
>
> Key: CASSANDRA-10052
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10052
> Project: Cassandra
> Issue Type: Bug
> Reporter: Sharvanath Pathak
> Assignee: Stefania
> Fix For: 2.1.x, 2.2.x
>
>
> When a node goes down, the other nodes learn that through the gossip.
> And I do see the log from (Gossiper.java):
> {code}
> private void markDead(InetAddress addr, EndpointState localState)
> {
> if (logger.isTraceEnabled())
> logger.trace("marking as down {}", addr);
> localState.markDead();
> liveEndpoints.remove(addr);
> unreachableEndpoints.put(addr, System.nanoTime());
> logger.info("InetAddress {} is now DOWN", addr);
> for (IEndpointStateChangeSubscriber subscriber : subscribers)
> subscriber.onDead(addr, localState);
> if (logger.isTraceEnabled())
> logger.trace("Notified " + subscribers);
> }
> {code}
> Saying: "InetAddress 192.168.101.1 is now Down", in the Cassandra's system log.
> Now on all the other nodes the client side (java driver) says, " Cannot connect to any host, scheduling retry in 1000 milliseconds". They eventually do reconnect but some queries fail during this intermediate period.
> To me it seems like when the server pushes the nodeDown event, it call the getRpcAddress(endpoint), and thus sends localhost as the argument in the nodeDown event.
> As in org.apache.cassandra.transport.Server.java
> {code}
> public void onDown(InetAddress endpoint)
> {
> server.connectionTracker.send(Event.StatusChange.nodeDown(getRpcAddress(endpoint), server.socket.getPort()));
> }
> {code}
> the getRpcAddress returns localhost for any endpoint if the cassandra.yaml is using localhost as the configuration for rpc_address (which by the way is the default).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)