You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Faraaz Sareshwala <fs...@quantcast.com> on 2013/06/04 02:06:10 UTC

Re: Populating seeds dynamically

All the documentation that I have read about cassanrda always says to keep the
same list of seeds on every node in the cluster. Without this, you can end up
with fragmentation within your cluster where nodes don't know about other nodes
in the cluster. In your case, sure the nodes will be in the cluster for 10
minutes but what about sporadic failures that cause them to leave the ring and
then re-enter it? At that point, you might reach the network fragmentation
issue.

I also use puppet to push out the cassandra.yaml file. I've defined the list of
seeds in my puppet class and have puppet generate the cassandra.yaml file from
an erb template.

Hopefully that helps a bit :).

Faraaz

On Mon, Jun 03, 2013 at 04:59:23PM -0700, Aiman Parvaiz wrote:
> Hi all
> I am using puppet to push cassandra.yaml file which has seeds node hardcoded, going forward I don't want to hard code the seed nodes and I plan to maintain a list of seed nodes. Since I have a cluster in place I would populate this list for now to start with and next time when I add a node this list would be referred and three nodes would be read and populated as seeds in the yaml file.
> 
> This implementation can lead to different nodes running different seeds I know that this is not a ideal situation but I believe that if a node has been in the ring for long enough(say 10 minutes, it knows about other nodes in the ring) then it  can be used as a seed node.
> 
> What do you guys think of populating seeds this way and also please throw some light on why running different seeds is not a best practice(assuming that all potential seed candidates have been in ring long enough)
> 
> Thanks

Re: Populating seeds dynamically

Posted by Tim Wintle <ti...@gmail.com>.

On Mon, 2013-06-03 at 17:20 -0700, Aiman Parvaiz wrote:
> @Faraaz check out the comment by Aaron morton here : http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Seed-Nodes-td6077958.html
> Having same nodes is a good idea but it is not necessary.
> > In your case, sure the nodes will be in the cluster for 10
> > minutes but what about sporadic failures that cause them to leave the ring and
> > then re-enter it? At that point, you might reach the network fragmentation
> > issue.
> 
> I am not sure I understand this completely, if a node leaves the ring and re enters it would use its seed nodes to know about the ring and these nodes would be the ones which are part of ring so I don't see any information lag happening here.

As I understand the risk:

If you start with four nodes in the ring:

10.1.1.1
10.1.1.2
10.2.1.1
10.2.1.2

.. Then the connection between 10.1.0.0/16 and 10.2.0.0/16 is lost
temporarily through some routing issue / firewall config etc.

You run puppet to re-deploy on 10.1.1.1 and 10.1.1.2 . When they
discover nodes on the current ring, they only find the 10.1.0.0 nodes.

You do the same on 10.2.1.1 and 10.2.1.2, and they only find the nodes
on that segment of the ring (I'm not used to puppet - if the config
generation is run once on a single machine then you could still end up
with that by chance assuming that machine can still reach both parts of
the cluster)

At this point your networking team spot the routing issue and fix it -
but it's too late.

So you now have two disjoint rings running, which don't have any of the
other subring nodes in their seed nodes respectively.

.. all that is fairly unlikely to happen, but it's possible - and the
more switches and machines that are involved the more likely it becomes
that it will happen.

I'm not sure if that's the only risk - but if it is then IMHO it's a
question of if that risk is acceptable, and if so then adding monitoring
to continually check for that situation.

Tim

Re: Populating seeds dynamically

Posted by Aiman Parvaiz <ai...@grapheffect.com>.

@Faraaz check out the comment by Aaron morton here : http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Seed-Nodes-td6077958.html
Having same nodes is a good idea but it is not necessary.
> In your case, sure the nodes will be in the cluster for 10
> minutes but what about sporadic failures that cause them to leave the ring and
> then re-enter it? At that point, you might reach the network fragmentation
> issue.

I am not sure I understand this completely, if a node leaves the ring and re enters it would use its seed nodes to know about the ring and these nodes would be the ones which are part of ring so I don't see any information lag happening here.

On Jun 3, 2013, at 5:06 PM, Faraaz Sareshwala <fs...@quantcast.com> wrote:

> All the documentation that I have read about cassanrda always says to keep the
> same list of seeds on every node in the cluster. Without this, you can end up
> with fragmentation within your cluster where nodes don't know about other nodes
> in the cluster. In your case, sure the nodes will be in the cluster for 10
> minutes but what about sporadic failures that cause them to leave the ring and
> then re-enter it? At that point, you might reach the network fragmentation
> issue.
> 
> I also use puppet to push out the cassandra.yaml file. I've defined the list of
> seeds in my puppet class and have puppet generate the cassandra.yaml file from
> an erb template.
> 
> Hopefully that helps a bit :).
> 
> Faraaz
> 
> On Mon, Jun 03, 2013 at 04:59:23PM -0700, Aiman Parvaiz wrote:
>> Hi all
>> I am using puppet to push cassandra.yaml file which has seeds node hardcoded, going forward I don't want to hard code the seed nodes and I plan to maintain a list of seed nodes. Since I have a cluster in place I would populate this list for now to start with and next time when I add a node this list would be referred and three nodes would be read and populated as seeds in the yaml file.
>> 
>> This implementation can lead to different nodes running different seeds I know that this is not a ideal situation but I believe that if a node has been in the ring for long enough(say 10 minutes, it knows about other nodes in the ring) then it  can be used as a seed node.
>> 
>> What do you guys think of populating seeds this way and also please throw some light on why running different seeds is not a best practice(assuming that all potential seed candidates have been in ring long enough)
>> 
>> Thanks