You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Jeremy Stribling <st...@nicira.com> on 2011/07/02 02:46:00 UTC

incomplete schema sync for new node

Hi all,

I'm running into a problem with Cassandra, where a new node coming up 
seems to only get an incomplete set of schema mutations when 
bootstrapping, and as a result hits an "IllegalStateException: 
replication factor (3) exceeds number of endpoints (2)" error.

I will describe the sequence of events below as I see them, but first I 
need to warn you that I run Cassandra in a very non-standard way.  I 
embed it in a JVM, along with Zookeeper, and other classes for a product 
we are working on.  We need to bring nodes up and down dynamically in 
our product, including going from one node to three nodes, and back down 
to one, at any time.  If we ever drop below three nodes, we have code 
that sets the replication factor of our keyspaces to 1; similarly, 
whenever we have three or more nodes, we change the replication factor 
to 3.  I know this is frowned upon by the community, but we're stuck 
with doing it this way for now.

Ok, here's the scenario:

1) Node 50.0.0.4 bootstraps into a cluster consisting of nodes 50.0.0.2 
and 50.0.0.3.
2) Once 50.0.0.4 is fully bootstrapped, we change the replication factor 
for our two keyspaces to 3.
3) Then node 50.0.0.2 is taken down permanently, and we change the 
replication factor back down to 1.
4) We then remove node 50.0.0.2's tokens using the removeToken call on 
node 50.0.0.3.
5) Then we start node 50.0.0.5, and have it join the cluster using 
50.0.0.3 and 50.0.0.4 as seeds.
6) 50.0.0.5 starts receiving schema mutations to get it up to speed; the 
last one it receives (7d51e757-a40b-11e0-a98d-65ed1eced995) has the 
replication factor at 3.  However, there should be more schema updates 
after this that never arrive (you can see them arrive at 50.0.0.4 while 
it is bootstrapping).
7) Minutes after receiving this last mutation, node 50.0.0.5 hits the 
IllegalStateException I've listed above, and I think for that reason 
never successfully joins the cluster.

My question is why doesn't node 50.0.0.5 receive the schema updates that 
follow 7d51e757-a40b-11e0-a98d-65ed1eced995?  (For example, 
8fc8820d-a40c-11e0-9eaf-6720e49624c2 is present in 50.0.0.4's log and 
sets the replication factor back down to 1.)

I've put logs for nodes 50.0.0.3/4/5 at 
http://pdos.csail.mit.edu/~strib/cassandra_logs.tgz .  The logs are 
pretty messy because they includes log messages from both Zookeeper and 
our product code -- sorry about that.  Also, I think the clock on node 
50.0.0.4 is a few minutes ahead of the other nodes' clocks.

I also noticed in 50.0.0.4's log the following exceptions:

2011-07-01 18:00:49,832 76315 [HintedHandoff:1] ERROR 
org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor  - Error in 
ThreadPoolExecutor
java.lang.RuntimeException: java.lang.RuntimeException: Could not reach 
schema agreement with /50.0.0.3 in 60000ms
         at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
         at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
         at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
         at java.lang.Thread.run(Thread.java:662)

I don't know if that's related or not.

Thanks in advance,

Jeremy

Re: incomplete schema sync for new node

Posted by aaron morton <aa...@thelastpickle.com>.

First, move off 0.7.2 if you can. While you may not get hit with this  https://github.com/apache/cassandra/blob/cassandra-0.7.6-2/NEWS.txt#L61 you may have trouble with this https://issues.apache.org/jira/browse/CASSANDRA-2554

For background read the section on Staring Up and on Concurrency here http://wiki.apache.org/cassandra/LiveSchemaUpdates

>> java.lang.RuntimeException: java.lang.RuntimeException: Could not reach schema agreement with /50.0.0.3 in 60000ms

> 
Means you have split brain schemas in the your cluster. use describe cluster in the cli to see how many versions of the schema you have out there. 

The exception is thrown when the placement strategy (Simple or OldNTS) is trying to calculate the Natural Endpoints for a Token  (AbstractReplicationStrategy.calculateNaturalEndpoints()) . This can happen reading/writing a key, or in your case when the node is bootstrapping and trying to work out which endpoints are responsible for the token ranges. AFAIK adding the migrations if an online process, the server is up and running while they are been added. So if there is anything that happens while the schema is invalid that requires a valid schema you will get the error. 

All the "Previous version mismatch. cannot apply." errors in the log for 0.4 mean it got a migration from someone but the migration was received out of order. The current version on the node is not the version that was present when this migration was applied. 
 
The simple answer is stop doing do what you're doing, it sounds dangerous and inefficient to me. AFAIK it's not what the schema migrations were designed to do and moving from CL 1 to 3 will increase the repair workload.Aside from the risks of changing RF up and down a lot.  

The long answer may be to always apply the schema changes on the same node; check there is a single version of the schema before adding a new one; and take a look at monkeying around with the Schema and Migrations CF's in the System KS to delete migrations you want skipped. 

Am frowning and tutting and stoking my beard. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 2 Jul 2011, at 12:58, Jeremy Stribling wrote:

> Oops, forgot to mention that we're using Cassandra 0.7.2.
> 
> On 07/01/2011 05:46 PM, Jeremy Stribling wrote:
>> Hi all,
>> 
>> I'm running into a problem with Cassandra, where a new node coming up seems to only get an incomplete set of schema mutations when bootstrapping, and as a result hits an "IllegalStateException: replication factor (3) exceeds number of endpoints (2)" error.
>> 
>> I will describe the sequence of events below as I see them, but first I need to warn you that I run Cassandra in a very non-standard way.  I embed it in a JVM, along with Zookeeper, and other classes for a product we are working on.  We need to bring nodes up and down dynamically in our product, including going from one node to three nodes, and back down to one, at any time.  If we ever drop below three nodes, we have code that sets the replication factor of our keyspaces to 1; similarly, whenever we have three or more nodes, we change the replication factor to 3.  I know this is frowned upon by the community, but we're stuck with doing it this way for now.
>> 
>> Ok, here's the scenario:
>> 
>> 1) Node 50.0.0.4 bootstraps into a cluster consisting of nodes 50.0.0.2 and 50.0.0.3.
>> 2) Once 50.0.0.4 is fully bootstrapped, we change the replication factor for our two keyspaces to 3.
>> 3) Then node 50.0.0.2 is taken down permanently, and we change the replication factor back down to 1.
>> 4) We then remove node 50.0.0.2's tokens using the removeToken call on node 50.0.0.3.
>> 5) Then we start node 50.0.0.5, and have it join the cluster using 50.0.0.3 and 50.0.0.4 as seeds.
>> 6) 50.0.0.5 starts receiving schema mutations to get it up to speed; the last one it receives (7d51e757-a40b-11e0-a98d-65ed1eced995) has the replication factor at 3.  However, there should be more schema updates after this that never arrive (you can see them arrive at 50.0.0.4 while it is bootstrapping).
>> 7) Minutes after receiving this last mutation, node 50.0.0.5 hits the IllegalStateException I've listed above, and I think for that reason never successfully joins the cluster.
>> 
>> My question is why doesn't node 50.0.0.5 receive the schema updates that follow 7d51e757-a40b-11e0-a98d-65ed1eced995?  (For example, 8fc8820d-a40c-11e0-9eaf-6720e49624c2 is present in 50.0.0.4's log and sets the replication factor back down to 1.)
>> 
>> I've put logs for nodes 50.0.0.3/4/5 at http://pdos.csail.mit.edu/~strib/cassandra_logs.tgz .  The logs are pretty messy because they includes log messages from both Zookeeper and our product code -- sorry about that.  Also, I think the clock on node 50.0.0.4 is a few minutes ahead of the other nodes' clocks.
>> 
>> I also noticed in 50.0.0.4's log the following exceptions:
>> 
>> 2011-07-01 18:00:49,832 76315 [HintedHandoff:1] ERROR org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor  - Error in ThreadPoolExecutor
>> java.lang.RuntimeException: java.lang.RuntimeException: Could not reach schema agreement with /50.0.0.3 in 60000ms
>>        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
>>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) 
>>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) 
>>        at java.lang.Thread.run(Thread.java:662)
>> 
>> I don't know if that's related or not.
>> 
>> Thanks in advance,
>> 
>> Jeremy
>>

Re: incomplete schema sync for new node

Posted by Jeremy Stribling <st...@nicira.com>.

Oops, forgot to mention that we're using Cassandra 0.7.2.

On 07/01/2011 05:46 PM, Jeremy Stribling wrote:
> Hi all,
>
> I'm running into a problem with Cassandra, where a new node coming up 
> seems to only get an incomplete set of schema mutations when 
> bootstrapping, and as a result hits an "IllegalStateException: 
> replication factor (3) exceeds number of endpoints (2)" error.
>
> I will describe the sequence of events below as I see them, but first 
> I need to warn you that I run Cassandra in a very non-standard way.  I 
> embed it in a JVM, along with Zookeeper, and other classes for a 
> product we are working on.  We need to bring nodes up and down 
> dynamically in our product, including going from one node to three 
> nodes, and back down to one, at any time.  If we ever drop below three 
> nodes, we have code that sets the replication factor of our keyspaces 
> to 1; similarly, whenever we have three or more nodes, we change the 
> replication factor to 3.  I know this is frowned upon by the 
> community, but we're stuck with doing it this way for now.
>
> Ok, here's the scenario:
>
> 1) Node 50.0.0.4 bootstraps into a cluster consisting of nodes 
> 50.0.0.2 and 50.0.0.3.
> 2) Once 50.0.0.4 is fully bootstrapped, we change the replication 
> factor for our two keyspaces to 3.
> 3) Then node 50.0.0.2 is taken down permanently, and we change the 
> replication factor back down to 1.
> 4) We then remove node 50.0.0.2's tokens using the removeToken call on 
> node 50.0.0.3.
> 5) Then we start node 50.0.0.5, and have it join the cluster using 
> 50.0.0.3 and 50.0.0.4 as seeds.
> 6) 50.0.0.5 starts receiving schema mutations to get it up to speed; 
> the last one it receives (7d51e757-a40b-11e0-a98d-65ed1eced995) has 
> the replication factor at 3.  However, there should be more schema 
> updates after this that never arrive (you can see them arrive at 
> 50.0.0.4 while it is bootstrapping).
> 7) Minutes after receiving this last mutation, node 50.0.0.5 hits the 
> IllegalStateException I've listed above, and I think for that reason 
> never successfully joins the cluster.
>
> My question is why doesn't node 50.0.0.5 receive the schema updates 
> that follow 7d51e757-a40b-11e0-a98d-65ed1eced995?  (For example, 
> 8fc8820d-a40c-11e0-9eaf-6720e49624c2 is present in 50.0.0.4's log and 
> sets the replication factor back down to 1.)
>
> I've put logs for nodes 50.0.0.3/4/5 at 
> http://pdos.csail.mit.edu/~strib/cassandra_logs.tgz .  The logs are 
> pretty messy because they includes log messages from both Zookeeper 
> and our product code -- sorry about that.  Also, I think the clock on 
> node 50.0.0.4 is a few minutes ahead of the other nodes' clocks.
>
> I also noticed in 50.0.0.4's log the following exceptions:
>
> 2011-07-01 18:00:49,832 76315 [HintedHandoff:1] ERROR 
> org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor  - Error 
> in ThreadPoolExecutor
> java.lang.RuntimeException: java.lang.RuntimeException: Could not 
> reach schema agreement with /50.0.0.3 in 60000ms
>         at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) 
>
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) 
>
>         at java.lang.Thread.run(Thread.java:662)
>
> I don't know if that's related or not.
>
> Thanks in advance,
>
> Jeremy
>