You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Jonathan Ellis (JIRA)" <ji...@apache.org> on 2015/08/24 17:57:46 UTC

[jira] [Resolved] (CASSANDRA-8035) 2.0.x repair causes large increasein client latency even for small datasets

     [ https://issues.apache.org/jira/browse/CASSANDRA-8035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis resolved CASSANDRA-8035.
---------------------------------------
       Resolution: Cannot Reproduce
    Fix Version/s:     (was: 2.0.x)

Closing as cantrepro since 2.0 is EOL.  Please reopen if you see this on 2.1+

> 2.0.x repair causes large increasein client latency even for small datasets
> ---------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8035
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8035
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: c-2.0.10, 3 nodes per @ DCs.  Load < 50 MB
>            Reporter: Chris Burroughs
>         Attachments: cl-latency.png, cpu-idle.png, keyspace-99p.png, row-cache-hit-rate.png
>
>
> Running repair causes a significnat increase in client latency even when the total amount of data per node is very small.
> Each node 900 req/s and during normal operations the 99p Client Request Lantecy is less than 4 ms and usually less than 1ms.  During repair the latency increases to within 4-10ms on all nodes.  I am unable to find any resource based explantion for this.  Several graphs are attached to summarize.  Repair started at about 10:10 and finished around 10:25.
>  * Client Request Latency goes up significantly.
>  * Local keyspace read latency is flat.  I interpret this to mean that it's purly coordinator overhead that's causing the slowdown.
>  * Row cache hit rate is unaffected ( and is very high).  Between these two metrics I don't think there is any doubt that virtually all reads are being satisfied in memory.
>  * There is plenty of available cpu.  Aggregate cpu used (mostly nic) did go up during this.
> Having more/larger keyspaces seems to make it worse.  Having two keyspaces on this cluster (still with total size << RAM) caused larger increases in latency which would have made for better graphs but it pushed the cluster well outsid of SLAs and we needed to move the second keyspace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)