You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Fd Habash <fm...@gmail.com> on 2018/03/02 21:29:12 UTC
RE: On a 12-node Cluster, Starting C* on a Seed Node Increases ReadLatency from 150ms to 1.5 sec.

I understand you use Apache Cassandra 2.2.8. :)
- Yes. It was a typo

In Apache Cassandra 2.2.8, this triggers incremental repairs I believe,
- Yes, default as of 2.2 and using primary range which repairs runs on every node in the cluster

. Did you replace the node in-place?
- Yes. We removed from its seed provider list. Otherwise, it won’t bootstrap. . 

You should be able to have nodes going down, or being fairly slow …
- When we stopped C* on this node, read performance recovered well. Once started, and now with no repairs running at all, latency increased bad to over 1.5 secs. This affected the node (in AZ 1) and the other 8 nodes ( 4 in AZ 2 and 4 in AZ 3). That is, it slowed down the other 2 AZ’s. 
- The application reads with CL=LQ
- This behavior I do not understand. There is no streaming.

My coworker Alexander wrote about this a few month ago, i
- We have been looking into Reaper for past 2 months. Work in progress. 

And thank you for the thorough response. 


From: Alain RODRIGUEZ
Sent: Friday, March 2, 2018 11:43 AM
To: user cassandra.apache.org
Subject: Re: On a 12-node Cluster, Starting C* on a Seed Node Increases ReadLatency from 150ms to 1.5 sec.

Hello,

This is a 2.8.8. cluster

That's an exotic version!

I understand you use Apache Cassandra 2.2.8. :)

This single node was a seed node and it was running a ‘repair -pr’ at the time 

In Apache Cassandra 2.2.8, this triggers incremental repairs I believe, and they are relatively (some would say completely) broken. Let's say they caused a lot of troubles in many cases. If I am wrong and you are not running incremental repairs (default in your version off the top of my head) then you node might not have enough resource available to handle both the repair and the standard load. It might be something to check.

Consequences of incremental repairs are:

- Keeping SSTables split between repaired and not repaired table, increasing the number of SSTable
- Anti-compaction (splits SSTables) is used to keep them grouped.

This induces a lot of performances downsides such as (but not only):

- inefficient tombstone eviction
- More disk hit for the same queries
- More compaction work

Machine are then performing very poorly.

My coworker Alexander wrote about this a few month ago, it might be of interest: http://thelastpickle.com/blog/2017/12/14/should-you-use-incremental-repair.html
If repairs are a pain point, you might be interested in checking http://cassandra-reaper.io/, that aims at making this operation easier and more efficient.

I would say the fact this node is a seed nodes did not impact here, it is a coincidence due to the fact you picked a seed for the repair. Seed nodes are mostly working as any other node, excepted during bootstrap.

So we decided to bootstrap it.

I am not sure what happen when bootstrapping a seed node. I always removed it from the seed list first. Did you replace the node in-place? I guess if you had no warning and have no consistency issues, it's all good.

All we were able to see is that the seed node in question was different in that it had 5000 sstables while all others had around 2300. After bootstrap, seed node sstables reduced to 2500.

I would say this is fairly common (even more when using vnodes) as streaming of the data from all the other nodes is fast and compaction might take a while to catch up.

Why would starting C* on a single seed node affect the cluster this bad? 

That's a fair question. It depends on factors such as the client configuration, the replication factor, the consistency level used. If the node is involved in some reads, then the average latency will drop.

You should be able to have nodes going down, or being fairly slow and use the right nodes if the client is recent enough and well configured.
 
Is it gossip?

It might be, there were issues, but I believe in previous versions and / or on bigger cluster. I would dig for a 'repair' issue first, it seems more probable to me. 

I hope this helped,

C*heers,
-----------------------
Alain Rodriguez - @arodream - alain@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2018-03-02 14:42 GMT+00:00 Fd Habash <fm...@gmail.com>:
This is a 2.8.8. cluster with three AWS AZs, each with 4 nodes.
 
Few days ago we noticed a single node’s read latency reaching 1.5 secs there was 8 others with read latencies going up near 900 ms. 
 
This single node was a seed node and it was running a ‘repair -pr’ at the time. We intervened as follows …
 
• Stopping compactions during repair did not improve latency.
• Killing repair brought down latency to 200 ms on the seed node and the other 8.
• Restarting C* on the seed node increased latency again back to near 1.5 secs on the seed and other 8. At this point, there was no repair running and compactions were running. We left them alone. 
 
At this point, we saw that putting the seed node back in the cluster consistently worsened latencies on seed and 8 nodes = 9 out of the 12 nodes in the cluster. 
 
So we decided to bootstrap it. During the bootstrapping and afterwards, latencies remained near 200 ms which is what we wanted for now. 
 
All we were able to see is that the seed node in question was different in that it had 5000 sstables while all others had around 2300. After bootstrap, seed node sstables reduced to 2500.
 
Why would starting C* on a single seed node affect the cluster this bad? Again, no repair just 4 compactions that run routinely on it as well all others. Is it gossip?  What other plausible explanations are there?
 
----------------
Thank you