You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Mark Moseley <mo...@gmail.com> on 2011/01/13 20:13:32 UTC

Newbie Replication/Cluster Question

I'm just starting to play with Cassandra, so this is almost certainly
a conceptual problem on my part, so apologies in advance. I was
testing out how I'd do things like bring up new nodes. I've got a
simple 2-node cluster with my only keyspace having
replication_factor=2. This is on 32-bit Debian Squeeze. Java==Java(TM)
SE Runtime Environment (build 1.6.0_22-b04). This is using the
just-released 0.7.0 binaries. Configuration is pretty minimal besides
using SimpleAuthentication module.

The issue is that whenever I kill a node in the cluster and wipe its
datadir (i.e. rm -rf /var/lib/cassandra/*) and try to bootstrap it
back into the cluster (and this occurs in both the scenario of both
nodes being present during the writing of data as well as only a
single node being up during writing of data), it seems to join the
cluster and chug along till it keels over and dies with this:

 INFO [main] 2011-01-13 13:56:23,385 StorageService.java (line 399)
Bootstrapping
ERROR [main] 2011-01-13 13:56:23,402 AbstractCassandraDaemon.java
(line 234) Exception encountered during startup.
java.lang.IllegalStateException: replication factor (2) exceeds number
of endpoints (1)
	at org.apache.cassandra.locator.SimpleStrategy.calculateNaturalEndpoints(SimpleStrategy.java:60)
	at org.apache.cassandra.locator.AbstractReplicationStrategy.getRangeAddresses(AbstractReplicationStrategy.java:204)
	at org.apache.cassandra.dht.BootStrapper.getRangesWithSources(BootStrapper.java:198)
	at org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:83)
	at org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:417)
	at org.apache.cassandra.service.StorageService.initServer(StorageService.java:361)
	at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:161)
	at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:55)
	at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:217)
	at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:134)
Exception encountered during startup.
java.lang.IllegalStateException: replication factor (2) exceeds number
of endpoints (1)
	at org.apache.cassandra.locator.SimpleStrategy.calculateNaturalEndpoints(SimpleStrategy.java:60)
	at org.apache.cassandra.locator.AbstractReplicationStrategy.getRangeAddresses(AbstractReplicationStrategy.java:204)
	at org.apache.cassandra.dht.BootStrapper.getRangesWithSources(BootStrapper.java:198)
	at org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:83)
	at org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:417)
	at org.apache.cassandra.service.StorageService.initServer(StorageService.java:361)
	at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:161)
	at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:55)
	at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:217)
	at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:134)


Seems like something of a chicken-or-the-egg problem of it not liking
there only being 1 node but not letting node 2 join. Being that I've
been messing with Cassandra for only a couple of days, I'm assuming
I'm doing something wrong, but the only google'ing I can find for the
above error is just a couple of 4+ month-old tickets that all sound
resolved. It's probably worth mentioning that if both nodes are
started when I create the keyspace, the cluster appears to work just
fine and I can start/stop either node and get at any piece of data.

The nodetool ring output looks like this:

> Prior to starting 10.1.58.4 and then for a while after startup
Address         Status State   Load            Owns    Token
10.1.58.3       Up     Normal  524.99 KB       100.00%
74198390702807803312208811144092384306

> 10.1.58.4 seems to be joining
Address         Status State   Load            Owns    Token

74198390702807803312208811144092384306
10.1.58.4       Up     Joining 72.06 KB        56.66%
460947270041113367229815744049079597
10.1.58.3       Up     Normal  524.99 KB       43.34%
74198390702807803312208811144092384306

> Java exception, back to just 10.1.58.3
Address         Status State   Load            Owns    Token
10.1.58.3       Up     Normal  524.99 KB       100.00%
74198390702807803312208811144092384306

Re: Newbie Replication/Cluster Question

Posted by Mark Moseley <mo...@gmail.com>.
On Fri, Jan 14, 2011 at 4:29 PM, Aaron Morton <aa...@thelastpickle.com> wrote:
> Here's some slides I did last year that have a simple explanation of RF http://www.slideshare.net/mobile/aaronmorton/well-railedcassandra24112010-5901169
>
> Short version is, generally no single node contains all the data in the db.
> Normally the RF is going to be less than the number of nodes, and the higher the rf the number of concurrent node failure you can handle (when writing at Quorum).
>
> - at rf3 you can keep reading and writing with 1 node down. If you lose a second node the cluster will appear to be down for a portion of the keys. The portion depends on the total number of nodes.
> - at rf 5 the cluster will be up for all keys if you have 2 nodes down. If you have 3 down the cluster will appear down for only a portion of the keys, again the portion depends on the total number of nodes.
>
> Its a bit more complicated though, when I say 'node is down' I mean one of the nodes that the key would have been written to is down (the 3 or 5 above). So if you had 10 nodes, rf 5, you could have 4 nodes down and the cluster be available for all keys. So long as there are still 3 "natural endpoints" for each key.
>
> Hope that helps.
>
> Aaron
>
> On 15/01/2011, at 8:52 AM, Mark Moseley <mo...@gmail.com> wrote:
>
>>> Perhaps the better question would be, if I have a two node cluster and
>>> I want to be able to lose one box completely and replace it (without
>>> losing the cluster), what settings would I need? Or is that an
>>> impossible scenario? In production, I'd imagine a 3 node cluster being
>>> the minimum but even there I could see each box having a full replica,
>>> but probably not beyond 3.
>>
>> Or perhaps, in the case of losing a box completely in a 2-node RF=2
>> cluster, do I need to lower the replication_factor on the still-alive
>> box, bootstrap the replaced node back in, and then change the
>> replication_factor=2?
>

Excellent, thanks! I'll definitely be checking those out.  I just want
to make sure I've got the hang of DR before we start deploying
Cassandra, and I'd hate to figure all this out later on with angry
customers standing over my shoulder :)

Re: Newbie Replication/Cluster Question

Posted by Aaron Morton <aa...@thelastpickle.com>.
Here's some slides I did last year that have a simple explanation of RF http://www.slideshare.net/mobile/aaronmorton/well-railedcassandra24112010-5901169

Short version is, generally no single node contains all the data in the db. 
Normally the RF is going to be less than the number of nodes, and the higher the rf the number of concurrent node failure you can handle (when writing at Quorum).

- at rf3 you can keep reading and writing with 1 node down. If you lose a second node the cluster will appear to be down for a portion of the keys. The portion depends on the total number of nodes.
- at rf 5 the cluster will be up for all keys if you have 2 nodes down. If you have 3 down the cluster will appear down for only a portion of the keys, again the portion depends on the total number of nodes.

Its a bit more complicated though, when I say 'node is down' I mean one of the nodes that the key would have been written to is down (the 3 or 5 above). So if you had 10 nodes, rf 5, you could have 4 nodes down and the cluster be available for all keys. So long as there are still 3 "natural endpoints" for each key.

Hope that helps.

Aaron

On 15/01/2011, at 8:52 AM, Mark Moseley <mo...@gmail.com> wrote:

>> Perhaps the better question would be, if I have a two node cluster and
>> I want to be able to lose one box completely and replace it (without
>> losing the cluster), what settings would I need? Or is that an
>> impossible scenario? In production, I'd imagine a 3 node cluster being
>> the minimum but even there I could see each box having a full replica,
>> but probably not beyond 3.
> 
> Or perhaps, in the case of losing a box completely in a 2-node RF=2
> cluster, do I need to lower the replication_factor on the still-alive
> box, bootstrap the replaced node back in, and then change the
> replication_factor=2?

Re: Newbie Replication/Cluster Question

Posted by Mark Moseley <mo...@gmail.com>.
> Perhaps the better question would be, if I have a two node cluster and
> I want to be able to lose one box completely and replace it (without
> losing the cluster), what settings would I need? Or is that an
> impossible scenario? In production, I'd imagine a 3 node cluster being
> the minimum but even there I could see each box having a full replica,
> but probably not beyond 3.

Or perhaps, in the case of losing a box completely in a 2-node RF=2
cluster, do I need to lower the replication_factor on the still-alive
box, bootstrap the replaced node back in, and then change the
replication_factor=2?

Re: Newbie Replication/Cluster Question

Posted by Mark Moseley <mo...@gmail.com>.
On Thu, Jan 13, 2011 at 2:32 PM, Mark Moseley <mo...@gmail.com> wrote:
> On Thu, Jan 13, 2011 at 1:08 PM, Gary Dusbabek <gd...@gmail.com> wrote:
>> It is impossible to properly bootstrap a new node into a system where
>> there are not enough nodes to satisfy the replication factor.  The
>> cluster as it stands doesn't contain all the data you are asking it to
>> replicate on the new node.
>
> Ok, maybe I'm thinking of replication_factor backwards. I took it to
> mean how many nodes would have *full* copies of the whole of the
> keyspace's data, in which case with my keyspace with
> replication_factor=2 the still-alive node would have 100% of the data
> to replicate to the wiped-clean node--in which case all the data would
> be there to bootstrap. I was assuming replication_factor=2 in a 2-node
> cluster == both nodes having a full replica of the data. Do I have
> that wrong?
>
> What's also confusing is that I did this same test on a clean node
> that wasn't clustered yet (which is interesting that it doesn't
> complain then about replication_factor > # of nodes), so unless it was
> throwing away data as I was inserting it, it'd all be there.
>
> Is the general rule then that the max. replication factor must be
> #_of_nodes-1 then? If replication_factor==#_of_nodes, then if you lost
> a box, it seems like your cluster would be toast.


Perhaps the better question would be, if I have a two node cluster and
I want to be able to lose one box completely and replace it (without
losing the cluster), what settings would I need? Or is that an
impossible scenario? In production, I'd imagine a 3 node cluster being
the minimum but even there I could see each box having a full replica,
but probably not beyond 3.

Re: Newbie Replication/Cluster Question

Posted by Mark Moseley <mo...@gmail.com>.
On Thu, Jan 13, 2011 at 1:08 PM, Gary Dusbabek <gd...@gmail.com> wrote:
> It is impossible to properly bootstrap a new node into a system where
> there are not enough nodes to satisfy the replication factor.  The
> cluster as it stands doesn't contain all the data you are asking it to
> replicate on the new node.

Ok, maybe I'm thinking of replication_factor backwards. I took it to
mean how many nodes would have *full* copies of the whole of the
keyspace's data, in which case with my keyspace with
replication_factor=2 the still-alive node would have 100% of the data
to replicate to the wiped-clean node--in which case all the data would
be there to bootstrap. I was assuming replication_factor=2 in a 2-node
cluster == both nodes having a full replica of the data. Do I have
that wrong?

What's also confusing is that I did this same test on a clean node
that wasn't clustered yet (which is interesting that it doesn't
complain then about replication_factor > # of nodes), so unless it was
throwing away data as I was inserting it, it'd all be there.

Is the general rule then that the max. replication factor must be
#_of_nodes-1 then? If replication_factor==#_of_nodes, then if you lost
a box, it seems like your cluster would be toast.

Re: Newbie Replication/Cluster Question

Posted by Gary Dusbabek <gd...@gmail.com>.
It is impossible to properly bootstrap a new node into a system where
there are not enough nodes to satisfy the replication factor.  The
cluster as it stands doesn't contain all the data you are asking it to
replicate on the new node.

Gary.


On Thu, Jan 13, 2011 at 13:13, Mark Moseley <mo...@gmail.com> wrote:
> I'm just starting to play with Cassandra, so this is almost certainly
> a conceptual problem on my part, so apologies in advance. I was
> testing out how I'd do things like bring up new nodes. I've got a
> simple 2-node cluster with my only keyspace having
> replication_factor=2. This is on 32-bit Debian Squeeze. Java==Java(TM)
> SE Runtime Environment (build 1.6.0_22-b04). This is using the
> just-released 0.7.0 binaries. Configuration is pretty minimal besides
> using SimpleAuthentication module.
>
> The issue is that whenever I kill a node in the cluster and wipe its
> datadir (i.e. rm -rf /var/lib/cassandra/*) and try to bootstrap it
> back into the cluster (and this occurs in both the scenario of both
> nodes being present during the writing of data as well as only a
> single node being up during writing of data), it seems to join the
> cluster and chug along till it keels over and dies with this:
>
>  INFO [main] 2011-01-13 13:56:23,385 StorageService.java (line 399)
> Bootstrapping
> ERROR [main] 2011-01-13 13:56:23,402 AbstractCassandraDaemon.java
> (line 234) Exception encountered during startup.
> java.lang.IllegalStateException: replication factor (2) exceeds number
> of endpoints (1)
>        at org.apache.cassandra.locator.SimpleStrategy.calculateNaturalEndpoints(SimpleStrategy.java:60)
>        at org.apache.cassandra.locator.AbstractReplicationStrategy.getRangeAddresses(AbstractReplicationStrategy.java:204)
>        at org.apache.cassandra.dht.BootStrapper.getRangesWithSources(BootStrapper.java:198)
>        at org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:83)
>        at org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:417)
>        at org.apache.cassandra.service.StorageService.initServer(StorageService.java:361)
>        at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:161)
>        at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:55)
>        at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:217)
>        at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:134)
> Exception encountered during startup.
> java.lang.IllegalStateException: replication factor (2) exceeds number
> of endpoints (1)
>        at org.apache.cassandra.locator.SimpleStrategy.calculateNaturalEndpoints(SimpleStrategy.java:60)
>        at org.apache.cassandra.locator.AbstractReplicationStrategy.getRangeAddresses(AbstractReplicationStrategy.java:204)
>        at org.apache.cassandra.dht.BootStrapper.getRangesWithSources(BootStrapper.java:198)
>        at org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:83)
>        at org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:417)
>        at org.apache.cassandra.service.StorageService.initServer(StorageService.java:361)
>        at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:161)
>        at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:55)
>        at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:217)
>        at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:134)
>
>
> Seems like something of a chicken-or-the-egg problem of it not liking
> there only being 1 node but not letting node 2 join. Being that I've
> been messing with Cassandra for only a couple of days, I'm assuming
> I'm doing something wrong, but the only google'ing I can find for the
> above error is just a couple of 4+ month-old tickets that all sound
> resolved. It's probably worth mentioning that if both nodes are
> started when I create the keyspace, the cluster appears to work just
> fine and I can start/stop either node and get at any piece of data.
>
> The nodetool ring output looks like this:
>
>> Prior to starting 10.1.58.4 and then for a while after startup
> Address         Status State   Load            Owns    Token
> 10.1.58.3       Up     Normal  524.99 KB       100.00%
> 74198390702807803312208811144092384306
>
>> 10.1.58.4 seems to be joining
> Address         Status State   Load            Owns    Token
>
> 74198390702807803312208811144092384306
> 10.1.58.4       Up     Joining 72.06 KB        56.66%
> 460947270041113367229815744049079597
> 10.1.58.3       Up     Normal  524.99 KB       43.34%
> 74198390702807803312208811144092384306
>
>> Java exception, back to just 10.1.58.3
> Address         Status State   Load            Owns    Token
> 10.1.58.3       Up     Normal  524.99 KB       100.00%
> 74198390702807803312208811144092384306
>