You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Jake Maizel <ja...@soundcloud.com> on 2010/11/19 12:30:24 UTC

Questions about RackAwareStrategy and Multiple Data Centers

Hello,

(I tried my best to read all I could before posting but I really
couldn't find info to answer my questions.  So, here's my post.)

I have some questions.

Background:

We have a 6-node Cassandra cluster running in one data center with the
following config:

Cassandra 0.6.6
Replicas: 3
Placement: RackUnaware originally
Using Standard data storage and mmap index storage
RAM 16GB
Per node load: roughly 100GB +- 20

We then added a second 6-node cluster in a second data center with the
goal of migrating data to this new DC and then shutting down the
original nodes in the DC. We switched all nodes to RackAwareStrategy
and restarted.  We set up seeds on one of the new nodes pointing to
three of the old nodes (nodes 2, 4, 6).  We did not add any new nodes
as seeds to the old ones.

All went according to plan with injecting the new nodes into the
original key spaces half way between each of the original nodes.  this
just worked magically, as advertised.  :)

We ran nodetool repair on the new nodes, one at a time, waiting until
activity finished (Indicated by 0 compaction and 0 AE stages).

We then moved to running repair on the original nodes.  This is where
my questions came up.

We see that after starting repair on one node, we get lots of GC
(However, we are not swapping and disk io seems fine).  We also see
increases in the pending queue for AE stages (Seems normal, on the
order of 40-80 pending stages).  What doesn't seem normal is that we
see large increase in the AE pending queue on all other nodes not
running repair (I would expect this on neighbors, but not all nodes)
and it seems to take forever for these queues to drain (Forever = over
24 hrs).

Here are some questions I have (I can provide any additional info required):

1. If a node we run repair on finishes, indicated by compaction and AE
being 0, but the next node we want to repair still has non-zero queues
for C and AE, can we still start up the repair?
2. What is the effect of running repair on more than one node at a
time under 0.6.6?  I realize its not recommended but I accidentally
did this and am curious of the effect.
3. Is large GC activity normal during a repair outside the documented
"GC Storm" cases?

By the way, really great work on cassandra from an operations POV.
I've enjoyed working with it.

Regards and thanks for any help.

Jake

--
Jake Maizel
Network Operations
Soundcloud

Mail & GTalk: jake@soundcloud.com
Skype: jakecloud

Rosenthaler strasse 13, 101 19, Berlin, DE



-- 
Jake Maizel
Network Operations
Soundcloud

Mail & GTalk: jake@soundcloud.com
Skype: jakecloud

Rosenthaler strasse 13, 101 19, Berlin, DE

Re: Questions about RackAwareStrategy and Multiple Data Centers

Posted by Jonathan Ellis <jb...@gmail.com>.

On Fri, Nov 19, 2010 at 5:30 AM, Jake Maizel <ja...@soundcloud.com> wrote:
> We see that after starting repair on one node, we get lots of GC
> (However, we are not swapping and disk io seems fine).  We also see
> increases in the pending queue for AE stages (Seems normal, on the
> order of 40-80 pending stages).  What doesn't seem normal is that we
> see large increase in the AE pending queue on all other nodes not
> running repair (I would expect this on neighbors, but not all nodes)
> and it seems to take forever for these queues to drain (Forever = over
> 24 hrs).

Sounds like https://issues.apache.org/jira/browse/CASSANDRA-1674.
(Fixed for 0.6.9.)

> Here are some questions I have (I can provide any additional info required):
>
> 1. If a node we run repair on finishes, indicated by compaction and AE
> being 0, but the next node we want to repair still has non-zero queues
> for C and AE, can we still start up the repair?

I think having AE empty is the important one, but I'd wait for
everything to be quiesced to be safe.

> 2. What is the effect of running repair on more than one node at a
> time under 0.6.6?  I realize its not recommended but I accidentally
> did this and am curious of the effect.

Often the repairs will stomp on each others' internal state and
neither will finish.

> 3. Is large GC activity normal during a repair outside the documented
> "GC Storm" cases?

Yes.  Repair does a lot of object allocation.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com