You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Joel Knighton (JIRA)" <ji...@apache.org> on 2015/11/13 17:59:11 UTC
[jira] [Commented] (CASSANDRA-10111) reconnecting snitch can bypass cluster name check

    [ https://issues.apache.org/jira/browse/CASSANDRA-10111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15004289#comment-15004289 ] 

Joel Knighton commented on CASSANDRA-10111:
-------------------------------------------

This can occur because we only check for cluster name mismatches in the {{GossipDigestSynVerbHandler}}.  In the original design of Cassandra, this was sufficient, since we always replied to the {{listen_address}}.

Since we now reply to the {{broadcast_address}}, the {{GossipDigestAckVerbHandler}} and the {{GossipDigestAck2VerbHandler}} also need to check {{clusterId}} for mismatches. {{GossipDigestAck}} and {{GossipDigestAck2}} don't contain {{clusterId}} currently, so we need to bump the {{MessagingService}} version to accommodate the addition of this field.

The reason this metadata contamination is unidirectional is as follows:
1. New node sends {{GossipDigestSyn}} asking for all info.
2. Node from cluster A replies to cluster B node with shared broadcast address, adding info for all nodes from cluster A and asking for no info.
3. Cluster B node doesn't share cluster B data since it hasn't been requested.

All subsequent direct gossiping between the two clusters is blocked by the {{GossipDigestSynVerbHandler}}.

I have a working fix for this; we need to decide when a {{MessagingService}} bump will occur.

Thanks for the report!

> reconnecting snitch can bypass cluster name check
> -------------------------------------------------
>
>                 Key: CASSANDRA-10111
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10111
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Distributed Metadata
>         Environment: 2.0.x
>            Reporter: Chris Burroughs
>            Assignee: Joel Knighton
>              Labels: gossip
>             Fix For: 2.1.x
>
>
> Setup:
>  * Two clusters: A & B
>  * Both are two DC cluster
>  * Both use GossipingPropertyFileSnitch with different listen_address/broadcast_address
> A new node was added to cluster A with a broadcast_address of an existing node in cluster B (due to an out of data DNS entry).  Cluster B  added all of the nodes from cluster A, somehow bypassing the cluster name mismatch check for this nodes.  The first reference to cluster A nodes in cluster B logs is when then were added:
> {noformat}
>  INFO [GossipStage:1] 2015-08-17 15:08:33,858 Gossiper.java (line 983) Node /8.37.70.168 is now part of the cluster
> {noformat}
> Cluster B nodes then tried to gossip to cluster A nodes, but cluster A kept them out with 'ClusterName mismatch'.  Cluster B however tried to send to send reads/writes to cluster A and general mayhem ensued.
> Obviously this is a Bad (TM) config that Should Not Be Done.  However, since the consequence of crazy merged clusters are really bad (the reason there is the name mismatch check in the first place) I think the hole is reasonable to plug.  I'm not sure exactly what the code path is that skips the check in GossipDigestSynVerbHandler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)