You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Ted Zlatanov <tz...@lifelogs.com> on 2010/03/01 19:15:11 UTC

finding Cassandra servers

I need to find Cassandra servers on my network from several types of
clients and platforms.  The goal is to make adding and removing servers
painless, assuming a leading window of at least 1 hour.  The discovery
should be automatic and distributed.  I want to minimize management.

Round-robin DNS with a 1-hour TTL would work all right, but I was
wondering if Bonjour/Zeroconf is a better idea and what else should I
consider.

Thanks
Ted


Re: finding Cassandra servers

Posted by Gary Dusbabek <gd...@gmail.com>.
-1 core.
+1 contrib.
+10 github.

Client-endpoint discovery is not currently addressed at all in the
codebase.  I don't think it is a job we should take up because needs
will vary across applications and there isn't a general solution that
will work for everybody.

Gary

2010/3/3 Ted Zlatanov <tz...@lifelogs.com>:
> On Wed, 3 Mar 2010 09:32:33 -0600 Gary Dusbabek <gd...@gmail.com> wrote:
>
> GD> 2010/3/3 Ted Zlatanov <tz...@lifelogs.com>:
>>> This requires knowledge of the seeds so I need to at least look in
>>> storage-conf.xml to find them.  Are you saying there's no chance of
>>> Cassandra nodes (or just seeds) announcing themselves, even if it's
>>> optional behavior that's off by default?  If so I'll do the contrib mDNS
>>> service but it really seems like a backward way to do things.
>
> GD> Nodes already announce themselves, only just to the cluster.  That's
> GD> what gossip is for.  I don't see the point of making the announcement
> GD> to the subnet at large.
>
> GD> The decision rests with the community.  Obviously, if there is enough
> GD> merit to this work, it will find its way into the codebase.  I just
> GD> think it falls into the realm of shiny-and-neat (mdns and automatic
> GD> discovery is cool) and not in the realm of pragmatic (not reliable
> GD> across subnets).
>
> It's currently not possible to find a usable node without running
> centralized services like RRDNS or a special mDNS broadcaster as you
> suggested.  I don't think this is shiny and neat, it's a matter of
> running in a true decentralized environment (which Cassandra is supposed
> to fit into).
>
> The subnet limitation is not an issue in my environment (we forward
> much, much larger multicast volumes routinely) but I understand routing
> multicasts is not everyone's cup of tea.  IMHO it's better than the
> current situation and, mDNS being a well-known standard, can at least be
> handled at the switch level without code changes.
>
> I can do a patch+ticket for this in the core, making it optional and off
> by default, or do the same for a contrib/ service as you suggested.  So
> I'd appreciate a +1/-1 quick vote on whether this can go in the core to
> save me from rewriting the patch later.
>
> Ted
>
>

Re: finding Cassandra servers

Posted by Jonathan Ellis <jb...@gmail.com>.
We appear to be reaching consensus that this is solving a non-problem,
so I have closed that ticket.

2010/3/3 Ted Zlatanov <tz...@lifelogs.com>:
> On Wed, 3 Mar 2010 12:08:06 -0500 Ian Holsman <ia...@holsman.net> wrote:
>
> IH> We could create a branch or git fork where you guys could develop it,
> IH> and if it reaches a usable state and others find it interesting it
> IH> could get integrated in then
>
> Thanks, Ian.  Would it be OK to do it as a patch in
> http://issues.apache.org/jira/browse/CASSANDRA-846?  Or is there a
> reason for using a branch/fork instead?
>
> Ted
>
>

Re: finding Cassandra servers

Posted by Ted Zlatanov <tz...@lifelogs.com>.
On Wed, 3 Mar 2010 12:08:06 -0500 Ian Holsman <ia...@holsman.net> wrote: 

IH> We could create a branch or git fork where you guys could develop it,
IH> and if it reaches a usable state and others find it interesting it
IH> could get integrated in then

Thanks, Ian.  Would it be OK to do it as a patch in
http://issues.apache.org/jira/browse/CASSANDRA-846?  Or is there a
reason for using a branch/fork instead?

Ted


Re: finding Cassandra servers

Posted by Ian Holsman <ia...@holsman.net>.
+1 on erics comments
We could create a branch or git fork where you guys could develop it,
and if it reaches a usable state and others find it interesting it
could get integrated in then


On 3/3/10, Eric Evans <ee...@rackspace.com> wrote:
> On Wed, 2010-03-03 at 10:05 -0600, Ted Zlatanov wrote:
>> I can do a patch+ticket for this in the core, making it optional and
>> off by default, or do the same for a contrib/ service as you
>> suggested.  So I'd appreciate a +1/-1 quick vote on whether this can
>> go in the core to save me from rewriting the patch later.
>
> I don't think voting is going to help. Voting doesn't do anything to
> develop consensus and it seems pretty clear that no consensus exists
> here.
>
> It's entirely possible that you've identified a problem that others
> can't see, or haven't yet encountered. I don't see it, but then maybe
> I'm just thick.
>
> Either way, if you think this is important, the onus is on you to
> demonstrate the merit of your idea and contrib/ or a github project is
> one way to do that (the latter has the advantage of not needing to rely
> on anyone else).
>
>
> --
> Eric Evans
> eevans@rackspace.com
>
>

-- 
Sent from my mobile device

Re: finding Cassandra servers

Posted by Brandon Williams <dr...@gmail.com>.
2010/3/3 Ted Zlatanov <tz...@lifelogs.com>

> On Wed, 3 Mar 2010 09:04:37 -0800 Ryan King <ry...@twitter.com> wrote:
>
> RK> Something like RRDNS is no more complex that managing a list of seed
> nodes.
>
> My concern is that both RRDNS and seed node lists are vulnerable to
> individual node failure.


They're not.  That's why they're lists.  If one doesn't work out, move along
to the next.


>  Updating DNS when a node dies means you have
> to wait until the TTL expires, and if you lower the TTL too much your
> server will get killed.
>

Don't do that.  Make your clients keep trying.  Any failure is likely to be
transient anyway, so running around messing with DNS every time a machine is
offline doesn't make much sense.

-Brandon

Re: finding Cassandra servers

Posted by Ted Zlatanov <tz...@lifelogs.com>.
On Wed, 3 Mar 2010 09:35:31 -0800 Ryan King <ry...@twitter.com> wrote: 

>> With seed node lists, if I get unlucky I'd be trying to hit a downed
>> node in which case I may as well just use RRDNS and deal with connection
>> failure from the start.

RK> Why would you not deal with connection failure?

I mean it's simpler to deal with one type of connection failure (to any
node in RRDNS) than multiples (to seed node to get node list, then to
random active node from that list).  Sorry if my phrasing was confusing.

Ted


Re: finding Cassandra servers

Posted by Ryan King <ry...@twitter.com>.
2010/3/3 Ted Zlatanov <tz...@lifelogs.com>:
> On Wed, 3 Mar 2010 09:04:37 -0800 Ryan King <ry...@twitter.com> wrote:
>
> RK> Something like RRDNS is no more complex that managing a list of seed nodes.
>
> How do your clients at Twitter find server nodes?  Do you just run them
> local to each node?

RRDNS + loading the token map to discover more servers. Our
implementation is open source:
http://github.com/fauna/cassandra/blob/master/lib/cassandra/cassandra.rb

> My concern is that both RRDNS and seed node lists are vulnerable to
> individual node failure.  Updating DNS when a node dies means you have
> to wait until the TTL expires, and if you lower the TTL too much your
> server will get killed.

If you combine it with a fault-tolerate thrift client and loading the
token map, it works fine.

> With seed node lists, if I get unlucky I'd be trying to hit a downed
> node in which case I may as well just use RRDNS and deal with connection
> failure from the start.

Why would you not deal with connection failure?

-ryan

Re: finding Cassandra servers

Posted by Ted Zlatanov <tz...@lifelogs.com>.
On Wed, 3 Mar 2010 09:19:28 -0800 Chris Goffinet <go...@digg.com> wrote: 

CG> At Digg we have automated infrastructure. We use Puppet + our own
CG> in-house system that allows us to query pools of nodes for
CG> 'seeds'. Config files like storage-conf.xml are auto generated on
CG> the fly, and we randomly pick a set of seeds.

CG> Seeds can be per datacenter as well. As soon as a machine is
CG> decommissioned, it no longer gets picked as seed.

On Wed, 3 Mar 2010 11:20:07 -0600 Brandon Williams <dr...@gmail.com> wrote: 

BW> 2010/3/3 Ted Zlatanov <tz...@lifelogs.com>
>> My concern is that both RRDNS and seed node lists are vulnerable to
>> individual node failure.

BW> They're not.  That's why they're lists.  If one doesn't work out, move along
BW> to the next.

>> Updating DNS when a node dies means you have
>> to wait until the TTL expires, and if you lower the TTL too much your
>> server will get killed.

BW> Don't do that.  Make your clients keep trying.  Any failure is likely to be
BW> transient anyway, so running around messing with DNS every time a machine is
BW> offline doesn't make much sense.

Thanks for the advice.  I am probably being paranoid about the
connection timeout; we're using Puppet as well so I'll just use it to
generate the seeds portion of the config file *and* a plain list of seed
nodes that each client can retrieve (so they don't have to parse the
XML).

On Wed, 3 Mar 2010 11:22:45 -0600 Jonathan Ellis <jb...@gmail.com> wrote: 

JE> We appear to be reaching consensus that this is solving a non-problem,
JE> so I have closed that ticket.

Sure.  Thanks for everyone's opinion, I really appreciate it.

Ted


Re: finding Cassandra servers

Posted by Chris Goffinet <go...@digg.com>.
At Digg we have automated infrastructure. We use Puppet + our own in-house system that allows us to query pools of nodes for 'seeds'. Config files like storage-conf.xml are auto generated on the fly, and we randomly pick a set of seeds. 

Seeds can be per datacenter as well. As soon as a machine is decommissioned, it no longer gets picked as seed.

-Chris

On Mar 3, 2010, at 9:12 AM, Ted Zlatanov wrote:

> On Wed, 3 Mar 2010 09:04:37 -0800 Ryan King <ry...@twitter.com> wrote: 
> 
> RK> Something like RRDNS is no more complex that managing a list of seed nodes.
> 
> How do your clients at Twitter find server nodes?  Do you just run them
> local to each node?
> 
> My concern is that both RRDNS and seed node lists are vulnerable to
> individual node failure.  Updating DNS when a node dies means you have
> to wait until the TTL expires, and if you lower the TTL too much your
> server will get killed.
> 
> With seed node lists, if I get unlucky I'd be trying to hit a downed
> node in which case I may as well just use RRDNS and deal with connection
> failure from the start.
> 
> Ted
> 


Re: finding Cassandra servers

Posted by Ted Zlatanov <tz...@lifelogs.com>.
On Wed, 3 Mar 2010 09:04:37 -0800 Ryan King <ry...@twitter.com> wrote: 

RK> Something like RRDNS is no more complex that managing a list of seed nodes.

How do your clients at Twitter find server nodes?  Do you just run them
local to each node?

My concern is that both RRDNS and seed node lists are vulnerable to
individual node failure.  Updating DNS when a node dies means you have
to wait until the TTL expires, and if you lower the TTL too much your
server will get killed.

With seed node lists, if I get unlucky I'd be trying to hit a downed
node in which case I may as well just use RRDNS and deal with connection
failure from the start.

Ted


Re: finding Cassandra servers

Posted by Ryan King <ry...@twitter.com>.
2010/3/3 Ted Zlatanov <tz...@lifelogs.com>:
> On Wed, 03 Mar 2010 10:43:19 -0600 Eric Evans <ee...@rackspace.com> wrote:
>
> EE> It's entirely possible that you've identified a problem that others
> EE> can't see, or haven't yet encountered. I don't see it, but then maybe
> EE> I'm just thick.
>
> Getting back to my original question, how do you (and others) find
> usable Cassandra nodes from your clients?  It's supposed to be a
> decentralized database and yet I only know of centralized ways (RRDNS)
> to locate nodes.  Contacting the seeds is not a decentralized solution
> and sidesteps the issue.  It also complicates the client unnecessarily.

Its not supposed to be a decentralized database, its supposed to be a
distributed database.

Something like RRDNS is no more complex that managing a list of seed nodes.

-ryan

> EE> Either way, if you think this is important, the onus is on you to
> EE> demonstrate the merit of your idea and contrib/ or a github project is
> EE> one way to do that (the latter has the advantage of not needing to rely
> EE> on anyone else).
>
> I'll submit a core patch in a jira ticket.  It's much easier than
> writing a full application and IMHO much more useful because it "just
> works."  If it gets rejected I'll move to contrib/ as you and Gary
> suggested.
>
> Ted
>
>

Re: finding Cassandra servers

Posted by Ted Zlatanov <tz...@lifelogs.com>.
On Wed, 03 Mar 2010 10:43:19 -0600 Eric Evans <ee...@rackspace.com> wrote: 

EE> It's entirely possible that you've identified a problem that others
EE> can't see, or haven't yet encountered. I don't see it, but then maybe
EE> I'm just thick.

Getting back to my original question, how do you (and others) find
usable Cassandra nodes from your clients?  It's supposed to be a
decentralized database and yet I only know of centralized ways (RRDNS)
to locate nodes.  Contacting the seeds is not a decentralized solution
and sidesteps the issue.  It also complicates the client unnecessarily.

EE> Either way, if you think this is important, the onus is on you to
EE> demonstrate the merit of your idea and contrib/ or a github project is
EE> one way to do that (the latter has the advantage of not needing to rely
EE> on anyone else).

I'll submit a core patch in a jira ticket.  It's much easier than
writing a full application and IMHO much more useful because it "just
works."  If it gets rejected I'll move to contrib/ as you and Gary
suggested.

Ted


Re: finding Cassandra servers

Posted by Christopher Brind <br...@brindy.org.uk>.
Great,  thanks Eric

On 3 Mar 2010 17:27, "Eric Evans" <ee...@rackspace.com> wrote:

On Wed, 2010-03-03 at 16:49 +0000, Christopher Brind wrote:
> So is the current general practice to ...
There are so many ways you could tackle this but...

If you're talking about provisioning/startup of new nodes, just use the
IPs of 2-4 nodes in the seeds section of configs.

If you're talking about clients, then round-robin DNS is one option.
Load-balancers are another. Either could be used with a subset of
higher-capacity/higher-availability nodes, or for the entire cluster.


> If so, what happens if that node is down? Is the entire cluster
> effectively broken at that poi...
You don't use just one node, see above.


> Or do clients simply maintain a list of nodes a just connect to the
> first available in the list...
It's possible to obtain a list of nodes over Thrift. So, yet another
option would be to use a short-list of well-known nodes (discovered via
round-robin DNS for example), to obtain a current node list and
distribute among them.

--

Eric Evans
eevans@rackspace.com

Re: finding Cassandra servers

Posted by Ryan King <ry...@twitter.com>.
On Wed, Mar 3, 2010 at 9:27 AM, Eric Evans <ee...@rackspace.com> wrote:
> On Wed, 2010-03-03 at 16:49 +0000, Christopher Brind wrote:
>> So is the current general practice to connect to a known node, e.g. by
>> ip address?
>
> There are so many ways you could tackle this but...
>
> If you're talking about provisioning/startup of new nodes, just use the
> IPs of 2-4 nodes in the seeds section of configs.
>
> If you're talking about clients, then round-robin DNS is one option.
> Load-balancers are another. Either could be used with a subset of
> higher-capacity/higher-availability nodes, or for the entire cluster.
>
>> If so, what happens if that node is down?  Is the entire cluster
>> effectively broken at that point?
>
> You don't use just one node, see above.
>
>> Or do clients simply maintain a list of nodes a just connect to the
>> first available in the list?
>
> It's possible to obtain a list of nodes over Thrift. So, yet another
> option would be to use a short-list of well-known nodes (discovered via
> round-robin DNS for example), to obtain a current node list and
> distribute among them.

This is exactly what we do.

-ryan

Re: finding Cassandra servers

Posted by Eric Evans <ee...@rackspace.com>.
On Wed, 2010-03-03 at 16:49 +0000, Christopher Brind wrote:
> So is the current general practice to connect to a known node, e.g. by
> ip address?

There are so many ways you could tackle this but...

If you're talking about provisioning/startup of new nodes, just use the
IPs of 2-4 nodes in the seeds section of configs.

If you're talking about clients, then round-robin DNS is one option.
Load-balancers are another. Either could be used with a subset of
higher-capacity/higher-availability nodes, or for the entire cluster.

> If so, what happens if that node is down?  Is the entire cluster
> effectively broken at that point?

You don't use just one node, see above.

> Or do clients simply maintain a list of nodes a just connect to the
> first available in the list? 

It's possible to obtain a list of nodes over Thrift. So, yet another
option would be to use a short-list of well-known nodes (discovered via
round-robin DNS for example), to obtain a current node list and
distribute among them.

-- 
Eric Evans
eevans@rackspace.com


Re: finding Cassandra servers

Posted by Christopher Brind <br...@brindy.org.uk>.
So is the current general practice to connect to a known node, e.g. by ip
address?

If so, what happens if that node is down?  Is the entire cluster effectively
broken at that point?

Or do clients simply maintain a list of nodes a just connect to the first
available in the list?

Thanks in advance.

Cheers
Chris

On 3 Mar 2010 16:43, "Eric Evans" <ee...@rackspace.com> wrote:

On Wed, 2010-03-03 at 10:05 -0600, Ted Zlatanov wrote:
> I can do a patch+ticket for this in the cor...
I don't think voting is going to help. Voting doesn't do anything to
develop consensus and it seems pretty clear that no consensus exists
here.

It's entirely possible that you've identified a problem that others
can't see, or haven't yet encountered. I don't see it, but then maybe
I'm just thick.

Either way, if you think this is important, the onus is on you to
demonstrate the merit of your idea and contrib/ or a github project is
one way to do that (the latter has the advantage of not needing to rely
on anyone else).


--
Eric Evans
eevans@rackspace.com

Re: finding Cassandra servers

Posted by Eric Evans <ee...@rackspace.com>.
On Wed, 2010-03-03 at 10:05 -0600, Ted Zlatanov wrote:
> I can do a patch+ticket for this in the core, making it optional and
> off by default, or do the same for a contrib/ service as you
> suggested.  So I'd appreciate a +1/-1 quick vote on whether this can
> go in the core to save me from rewriting the patch later.

I don't think voting is going to help. Voting doesn't do anything to
develop consensus and it seems pretty clear that no consensus exists
here.

It's entirely possible that you've identified a problem that others
can't see, or haven't yet encountered. I don't see it, but then maybe
I'm just thick.

Either way, if you think this is important, the onus is on you to
demonstrate the merit of your idea and contrib/ or a github project is
one way to do that (the latter has the advantage of not needing to rely
on anyone else).


-- 
Eric Evans
eevans@rackspace.com


Re: finding Cassandra servers

Posted by Ted Zlatanov <tz...@lifelogs.com>.
On Wed, 3 Mar 2010 09:32:33 -0600 Gary Dusbabek <gd...@gmail.com> wrote: 

GD> 2010/3/3 Ted Zlatanov <tz...@lifelogs.com>:
>> This requires knowledge of the seeds so I need to at least look in
>> storage-conf.xml to find them.  Are you saying there's no chance of
>> Cassandra nodes (or just seeds) announcing themselves, even if it's
>> optional behavior that's off by default?  If so I'll do the contrib mDNS
>> service but it really seems like a backward way to do things.

GD> Nodes already announce themselves, only just to the cluster.  That's
GD> what gossip is for.  I don't see the point of making the announcement
GD> to the subnet at large.

GD> The decision rests with the community.  Obviously, if there is enough
GD> merit to this work, it will find its way into the codebase.  I just
GD> think it falls into the realm of shiny-and-neat (mdns and automatic
GD> discovery is cool) and not in the realm of pragmatic (not reliable
GD> across subnets).

It's currently not possible to find a usable node without running
centralized services like RRDNS or a special mDNS broadcaster as you
suggested.  I don't think this is shiny and neat, it's a matter of
running in a true decentralized environment (which Cassandra is supposed
to fit into).

The subnet limitation is not an issue in my environment (we forward
much, much larger multicast volumes routinely) but I understand routing
multicasts is not everyone's cup of tea.  IMHO it's better than the
current situation and, mDNS being a well-known standard, can at least be
handled at the switch level without code changes.

I can do a patch+ticket for this in the core, making it optional and off
by default, or do the same for a contrib/ service as you suggested.  So
I'd appreciate a +1/-1 quick vote on whether this can go in the core to
save me from rewriting the patch later.

Ted


Re: finding Cassandra servers

Posted by Gary Dusbabek <gd...@gmail.com>.
2010/3/3 Ted Zlatanov <tz...@lifelogs.com>:
> On Wed, 3 Mar 2010 08:41:18 -0600 Gary Dusbabek <gd...@gmail.com> wrote:
>
> GD> It wouldn't be a lot work for you to write a mdns service that would
> GD> query the seeds for endpoints and publish it to interested clients.
> GD> It could go in contrib.
>
> This requires knowledge of the seeds so I need to at least look in
> storage-conf.xml to find them.  Are you saying there's no chance of
> Cassandra nodes (or just seeds) announcing themselves, even if it's
> optional behavior that's off by default?  If so I'll do the contrib mDNS
> service but it really seems like a backward way to do things.
>
> Ted
>
>

Nodes already announce themselves, only just to the cluster.  That's
what gossip is for.  I don't see the point of making the announcement
to the subnet at large.

The decision rests with the community.  Obviously, if there is enough
merit to this work, it will find its way into the codebase.  I just
think it falls into the realm of shiny-and-neat (mdns and automatic
discovery is cool) and not in the realm of pragmatic (not reliable
across subnets).

Gary.

Re: finding Cassandra servers

Posted by Ted Zlatanov <tz...@lifelogs.com>.
On Wed, 3 Mar 2010 08:41:18 -0600 Gary Dusbabek <gd...@gmail.com> wrote: 

GD> It wouldn't be a lot work for you to write a mdns service that would
GD> query the seeds for endpoints and publish it to interested clients.
GD> It could go in contrib.

This requires knowledge of the seeds so I need to at least look in
storage-conf.xml to find them.  Are you saying there's no chance of
Cassandra nodes (or just seeds) announcing themselves, even if it's
optional behavior that's off by default?  If so I'll do the contrib mDNS
service but it really seems like a backward way to do things.

Ted


Re: finding Cassandra servers

Posted by Gary Dusbabek <gd...@gmail.com>.
2010/3/3 Ted Zlatanov <tz...@lifelogs.com>:
>
> I don't think routing multicasts across subnets is a burden.

Try telling that to a network administrator who is concerned about
flooding his routers with multicast chatter.  First, you'll have to
find a network administrator who is willing to even have that
conversation.  That setting defaults to 'off' for very good reasons.

>
> GD> RRDNS would work, but something would need to keep that updated when
> GD> servers go away (it wouldn't be automatic).
>
> GD> If you can count on one of your (seed nodes) to be up, RRDNS could be
> GD> used to connect to one of them and fetch the token range list.  To do
> GD> this, create a thrift client and call describe_ring.  In older
> GD> versions you can get a jsonified endpoint map by calling
> GD> get_string_property('token map').
>
> It would really be much more efficient if I didn't have to maintain
> RRDNS, but could instead look at the mDNS broadcasts for the Cassandra
> service.  What you describe is a centralized model, no?
>
> With mDNS I wouldn't have to know which nodes are up or down, and I
> wouldn't have to do extra queries, it would just work.  I don't see why
> Cassandra doesn't need that functionality.  How else could you be
> guaranteed to find a live node if there is one on your subnet?
>

It wouldn't be a lot work for you to write a mdns service that would
query the seeds for endpoints and publish it to interested clients.
It could go in contrib.

Gary.

Re: finding Cassandra servers

Posted by Ted Zlatanov <tz...@lifelogs.com>.
On Wed, 3 Mar 2010 07:57:32 -0600 Gary Dusbabek <gd...@gmail.com> wrote: 

GD> 2010/3/3 Ted Zlatanov <tz...@lifelogs.com>:
TZ> I need to find Cassandra servers on my network from several types of
TZ> clients and platforms.  The goal is to make adding and removing servers
TZ> painless, assuming a leading window of at least 1 hour.  The discovery
TZ> should be automatic and distributed.  I want to minimize management.

GD> Nothing in the current codebase currently meets these needs.  But then
GD> again, cassandra doesn't need the described functionality.  Zeroconf
GD> confines itself to a single subnet (would require router configuration
GD> to work across subnets so that multicast goes through).  

I looked it up and today, mDNS seems to be the standard name for this
protocol (Bonjour/Rednezvous on Apple).  Zeroconf seems to be the older
name and there's a *lot* of name confusion so I'll just stick to "mDNS."

Here's a decent Java implementation: http://sourceforge.net/projects/jmdns/

I don't think routing multicasts across subnets is a burden.

GD> RRDNS would work, but something would need to keep that updated when
GD> servers go away (it wouldn't be automatic).

GD> If you can count on one of your (seed nodes) to be up, RRDNS could be
GD> used to connect to one of them and fetch the token range list.  To do
GD> this, create a thrift client and call describe_ring.  In older
GD> versions you can get a jsonified endpoint map by calling
GD> get_string_property('token map').

It would really be much more efficient if I didn't have to maintain
RRDNS, but could instead look at the mDNS broadcasts for the Cassandra
service.  What you describe is a centralized model, no?

With mDNS I wouldn't have to know which nodes are up or down, and I
wouldn't have to do extra queries, it would just work.  I don't see why
Cassandra doesn't need that functionality.  How else could you be
guaranteed to find a live node if there is one on your subnet?

Ted


Re: finding Cassandra servers

Posted by Gary Dusbabek <gd...@gmail.com>.
2010/3/3 Ted Zlatanov <tz...@lifelogs.com>:
> On Mon, 01 Mar 2010 12:15:11 -0600 Ted Zlatanov <tz...@lifelogs.com> wrote:
>
> TZ> I need to find Cassandra servers on my network from several types of
> TZ> clients and platforms.  The goal is to make adding and removing servers
> TZ> painless, assuming a leading window of at least 1 hour.  The discovery
> TZ> should be automatic and distributed.  I want to minimize management.
>
> TZ> Round-robin DNS with a 1-hour TTL would work all right, but I was
> TZ> wondering if Bonjour/Zeroconf is a better idea and what else should I
> TZ> consider.
>
> So... is this a dumb question or is there no good answer currently to
> discovering Cassandra servers?
>
> Ted
>
>

Nothing in the current codebase currently meets these needs.  But then
again, cassandra doesn't need the described functionality.  Zeroconf
confines itself to a single subnet (would require router configuration
to work across subnets so that multicast goes through).  RRDNS would
work, but something would need to keep that updated when servers go
away (it wouldn't be automatic).

If you can count on one of your (seed nodes) to be up, RRDNS could be
used to connect to one of them and fetch the token range list.  To do
this, create a thrift client and call describe_ring.  In older
versions you can get a jsonified endpoint map by calling
get_string_property('token map').

Hope that helps.

Gary.

Re: finding Cassandra servers

Posted by Ted Zlatanov <tz...@lifelogs.com>.
On Mon, 01 Mar 2010 12:15:11 -0600 Ted Zlatanov <tz...@lifelogs.com> wrote: 

TZ> I need to find Cassandra servers on my network from several types of
TZ> clients and platforms.  The goal is to make adding and removing servers
TZ> painless, assuming a leading window of at least 1 hour.  The discovery
TZ> should be automatic and distributed.  I want to minimize management.

TZ> Round-robin DNS with a 1-hour TTL would work all right, but I was
TZ> wondering if Bonjour/Zeroconf is a better idea and what else should I
TZ> consider.

So... is this a dumb question or is there no good answer currently to
discovering Cassandra servers?

Ted