You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Bill Au <bi...@gmail.com> on 2012/04/19 18:15:46 UTC

default required in cassandra-topology.properties?

All the examples of cassandra-topology.properties that I have seen have a
default entry assigning unknown nodes to a specific data center and rack.
Is it possible to have Cassandra ignore unknown nodes for the purpose of
replication?

Bill

RE: default required in cassandra-topology.properties?

Posted by Richard Lowe <ri...@arkivum.com>.
As far as I know it's not possible to leave replication factor undefined - if you do then Cassandra will default to RF=1 with SimpleStrategy.

The topology is local to each node, so unless all your nodes have the same topology file then it's possible for them each to have a different idea about the topology of the cluster.

I'm not sure what you're trying to achieve here, so I'll give an example.

Say you have two datacenters, DC1 and DC2. It's perfectly possible for nodes in DC1 to have a topology file that only mentions DC1 nodes and nodes in DC2 to have a topology file that only mentions DC2 nodes. You can then define one keyspace with strategy options DC1: 3 and another with DC2: 3 and this should work fine.

However if you had a keyspace with strategy options DC1: 3, DC2: 3 then you would AFAIK never be able to write to that column family because none of the nodes know enough about the topology; they can either address DC1, or address DC2, but not both.

If there were a third type of node that had topology defined for both DC1 and DC2 then these nodes would then be able to update the DC1+DC2 keyspace, even though DC1-only and DC2-only nodes would not.

So if there is a clear segregation in your data then splitting the topology may be OK, but if not then you will likely find that you can't update the keyspace unless a node has sufficient knowledge of the topology.

Depending on your use case a simpler alternative may be to just run two clusters instead of trying to define the shape of a single one through topology definitions. I think what you're talking about here is on the edge of what Cassandra is designed to do; it works best when all nodes are uniform and have the same understanding about the cluster.

Richard


From: Bill Au [mailto:bill.w.au@gmail.com]
Sent: 19 April 2012 19:58
To: user@cassandra.apache.org
Subject: Re: default required in cassandra-topology.properties?

I had thought that the topology file is used for replicas placement only such that for the token range that the unknown node is responsible for, data is still read and write there.  It just won't be replicated since replication factor is not defined.

Bill
On Thu, Apr 19, 2012 at 1:18 PM, Richard Lowe <ri...@arkivum.com>> wrote:
Yes it is possible. Put the following as the last line of your topology file:

default=unknown:unknown

So long as you don't have any DC or rack with this name your local node will not be able to address any nodes that aren't explicitly given in its topology file.

However bear in mind that, whilst Cassandra won't try to use replication factor to store to these 'unknown' nodes, their token may mean that the 'natural' home for a row is on a node that is not addressable. This can create holes in your dataset and create situations where data can 'disappear' because the bloom filter says the data is on a particular node (due to its token) but the coordinator can't contact that node to get at the data.

Careful use of replication factor and NetworkTopologyStrategy can help with this, but you should make sure that a node really doesn't need to contact the unknown nodes before marking them as such.


Richard


From: Bill Au [mailto:bill.w.au@gmail.com<ma...@gmail.com>]
Sent: 19 April 2012 17:16
To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Subject: default required in cassandra-topology.properties?

All the examples of cassandra-topology.properties that I have seen have a default entry assigning unknown nodes to a specific data center and rack.  Is it possible to have Cassandra ignore unknown nodes for the purpose of replication?

Bill


Re: default required in cassandra-topology.properties?

Posted by Bill Au <bi...@gmail.com>.
I had thought that the topology file is used for replicas placement only
such that for the token range that the unknown node is responsible for,
data is still read and write there.  It just won't be replicated since
replication factor is not defined.

Bill

On Thu, Apr 19, 2012 at 1:18 PM, Richard Lowe <ri...@arkivum.com>wrote:

>  Yes it is possible. Put the following as the last line of your topology
> file:****
>
> ** **
>
> default=unknown:unknown****
>
> ** **
>
> So long as you don’t have any DC or rack with this name your local node
> will not be able to address any nodes that aren’t explicitly given in its
> topology file. ****
>
> ** **
>
> However bear in mind that, whilst Cassandra won’t try to use replication
> factor to store to these ‘unknown’ nodes, their token may mean that the
> ‘natural’ home for a row is on a node that is not addressable. This can
> create holes in your dataset and create situations where data can
> ‘disappear’ because the bloom filter says the data is on a particular node
> (due to its token) but the coordinator can’t contact that node to get at
> the data. ****
>
> ** **
>
> Careful use of replication factor and NetworkTopologyStrategy can help
> with this, but you should make sure that a node really doesn’t need to
> contact the unknown nodes before marking them as such.****
>
> ** **
>
> ** **
>
> Richard****
>
> ** **
>
> ** **
>
> *From:* Bill Au [mailto:bill.w.au@gmail.com]
> *Sent:* 19 April 2012 17:16
> *To:* user@cassandra.apache.org
> *Subject:* default required in cassandra-topology.properties?****
>
> ** **
>
> All the examples of cassandra-topology.properties that I have seen have a
> default entry assigning unknown nodes to a specific data center and rack.
> Is it possible to have Cassandra ignore unknown nodes for the purpose of
> replication?
>
> Bill****
>

RE: default required in cassandra-topology.properties?

Posted by Richard Lowe <ri...@arkivum.com>.
Yes it is possible. Put the following as the last line of your topology file:

default=unknown:unknown

So long as you don't have any DC or rack with this name your local node will not be able to address any nodes that aren't explicitly given in its topology file.

However bear in mind that, whilst Cassandra won't try to use replication factor to store to these 'unknown' nodes, their token may mean that the 'natural' home for a row is on a node that is not addressable. This can create holes in your dataset and create situations where data can 'disappear' because the bloom filter says the data is on a particular node (due to its token) but the coordinator can't contact that node to get at the data.

Careful use of replication factor and NetworkTopologyStrategy can help with this, but you should make sure that a node really doesn't need to contact the unknown nodes before marking them as such.


Richard


From: Bill Au [mailto:bill.w.au@gmail.com]
Sent: 19 April 2012 17:16
To: user@cassandra.apache.org
Subject: default required in cassandra-topology.properties?

All the examples of cassandra-topology.properties that I have seen have a default entry assigning unknown nodes to a specific data center and rack.  Is it possible to have Cassandra ignore unknown nodes for the purpose of replication?

Bill