You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Carsten Krebs <ca...@gmx.net> on 2010/08/08 14:21:14 UTC

TokenRange contains endpoints without any port information?

I'm wondering why a TokenRange returned by describe_ring(keyspace) of the thrift API just returns endpoints consisting only of an address but omits any port information?
My first thought was, this method could be used to expose some information about the ring structure to the client, i.e. to do some client side load balancing. But now, I'm not sure about this anymore. Additionally, when looking into the code, I guess the address returned as part of the TokenRange is the address of the storage service which could differ from the thrift address, which in turn would make the returned endpoint useless for the client.
What is the purpose of this method or respectively why is the port information omitted?

TIA,

Carsten  


Re: TokenRange contains endpoints without any port information?

Posted by Benjamin Black <b...@b3k.us>.
On Sun, Aug 8, 2010 at 5:21 AM, Carsten Krebs <ca...@gmx.net> wrote:
>
> I'm wondering why a TokenRange returned by describe_ring(keyspace) of the thrift API just returns endpoints consisting only of an address but omits any port information?
> My first thought was, this method could be used to expose some information about the ring structure to the client, i.e. to do some client side load balancing. But now, I'm not sure about this anymore. Additionally, when looking into the code, I guess the address returned as part of the TokenRange is the address of the storage service which could differ from the thrift address, which in turn would make the returned endpoint useless for the client.

Not just _could_ differ, is _guaranteed_ to differ.  The inter-node
protocol is not Thrift.  The returned endpoint is not useless for the
client: you had to connect to the RPC port to even make the call.  Use
the same port when connecting to the other nodes.  It is bad practice
to have RPC ports differ between nodes in the same cluster.

> What is the purpose of this method or respectively why is the port information omitted?
>

Discovering which nodes are in the ring and which node claims each range.


b

Re: TokenRange contains endpoints without any port information?

Posted by Gary Dusbabek <gd...@gmail.com>.
On Sun, Aug 8, 2010 at 07:21, Carsten Krebs <ca...@gmx.net> wrote:
>
> I'm wondering why a TokenRange returned by describe_ring(keyspace) of the thrift API just returns endpoints consisting only of an address but omits any port information?
> My first thought was, this method could be used to expose some information about the ring structure to the client, i.e. to do some client side load balancing. But now, I'm not sure about this anymore. Additionally, when looking into the code, I guess the address returned as part of the TokenRange is the address of the storage service which could differ from the thrift address, which in turn would make the returned endpoint useless for the client.
> What is the purpose of this method

To give a picture of the ring topology.

>or respectively why is the port information omitted?

You already knew the thrift port to make the query connection.  The
only other port you *might* need to be concerned with is the storage
port, which is assumed to be constant across the cluster.  But really,
from a client perspective it does you no good to know this port, so
why bother exposing it?

Gary.

>
> TIA,
>
> Carsten
>
>

Re: TokenRange contains endpoints without any port information?

Posted by Aaron Morton <aa...@thelastpickle.com>.
The FAQ lists Round-Robin as the recommended way to find a node to connect to...
http://wiki.apache.org/cassandra/FAQ#node_clients_connect_to

As you say, your clients need to retry anyway. I have them hold the connection for a while (on the scale of minutes), then hit the DNS again and acquire a new connection. This lets them pickup new nodes and (i think over time) helps with keeping connections balanced around the cluster. 

If a node goes down for a shot time, it should not have too much of an affect on the clients. If you are taking a node out of the cluster you will need to update the DNS to remove it. 

Aaron


On 10 Aug, 2010,at 08:51 AM, Carsten Krebs <ca...@gmx.net> wrote:


On 08.08.2010, at 14:47 aaron morton wrote:
> 
> What sort of client side load balancing where you thinking of? I just use round robin DNS to distribute clients around the cluster, and have them recycle their connections every so often. 
> 
I was thinking about to use this method to give the client to the ability to "learn" what nodes are part of the cluster. Using this information to automatically adapt the set of nodes used by the client if a new node is added to or respectively removed from the cluster.

Why do you prefer round robin DNS for load balancing? 
One advantage I see is, that the client does not has to take care about the node set and especially the management of the node set. The reason why I was thinking about a client side load balancing was to avoid the need to write additional tools, to monitor all nodes in the cluster and changing the DNS entry if any node fails - and this as fast as possible to prevent the clients from trying to use a dead node. But the time writing this, I doesn't think anymore, that this is good point. This is just a point of some sort of retry logic, which is needed anyway in the client.

Carsten


Re: TokenRange contains endpoints without any port information?

Posted by Carsten Krebs <ca...@gmx.net>.
On 08.08.2010, at 14:47 aaron morton wrote:
> 
> What sort of client side load balancing where you thinking of? I just use round robin DNS to distribute clients around the cluster, and have them recycle their connections every so often. 
> 
I was thinking about to use this method to give the client to the ability to "learn" what nodes are part of the cluster. Using this information to automatically adapt the set of nodes used by the client if a new node is added to or respectively removed from the cluster.

Why do you prefer round robin DNS for load balancing? 
One advantage I see is, that the client does not has to take care about the node set and especially the management of the node set. The reason why I was thinking about a client side load balancing was to avoid the need to write additional tools, to monitor all nodes in the cluster and changing the DNS entry if any node fails - and this as fast as possible to prevent the clients from trying to use a dead node. But the time writing this, I doesn't think anymore, that this is good point. This is just a point of some sort of retry logic, which is needed anyway in the client.

Carsten


Re: TokenRange contains endpoints without any port information?

Posted by aaron morton <aa...@thelastpickle.com>.
The assumption would be that you will only have one instance of cassandra per machine. So you could contact it on the appropriate port, such as 9160 for thrift. (I've not run this command but assume it's returning IP4 addresses)

What sort of client side load balancing where you thinking of? I just use round robin DNS to distribute clients around the cluster, and have them recycle their connections every so often. 

Aaron

On 9 Aug 2010, at 00:21, Carsten Krebs wrote:

> 
> I'm wondering why a TokenRange returned by describe_ring(keyspace) of the thrift API just returns endpoints consisting only of an address but omits any port information?
> My first thought was, this method could be used to expose some information about the ring structure to the client, i.e. to do some client side load balancing. But now, I'm not sure about this anymore. Additionally, when looking into the code, I guess the address returned as part of the TokenRange is the address of the storage service which could differ from the thrift address, which in turn would make the returned endpoint useless for the client.
> What is the purpose of this method or respectively why is the port information omitted?
> 
> TIA,
> 
> Carsten  
>