You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Matt Byrd (JIRA)" <ji...@apache.org> on 2014/07/15 03:12:04 UTC

[jira] [Commented] (CASSANDRA-7533) Let MAX_OUTSTANDING_REPLAY_COUNT be configurable

    [ https://issues.apache.org/jira/browse/CASSANDRA-7533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14061506#comment-14061506 ] 

Matt Byrd commented on CASSANDRA-7533:
--------------------------------------

Just to add a bit more context, we had a single instance of Cassandra get fairly stuck replaying commitlogs.
It was burning through 2000% cpu + for over four hours with no end in sight, so we killed it removed commit logs brought it up and ran repair. (This was in q.a thankfully)

The problem can easily be reproduce by just writing 100,000 cql row (range deletes) to the same partition key, stopping Cassandra and starting it again.
I admit this is somewhat of an anti-pattern, but still quite a dramatic effect from not very much data.
The problem exercised here is that:
1. We contend in the memtable to do this insert in a CAS loop.
2. the work done in this loop becomes ever more expensive as RangeTombstoneList.dataSize is iterated over to compute the size.

Point 2. effectively fixed in 2.1 with all the off-heap allocation, the dataSize calculation effectively becomes more online.
To resolve this problem in 2.0 you could also keep this tally of dataSize online, or maybe start keeping it online once the list is sufficiently big to cause a problem.
Doing this seemed to help a lot, but far simpler was just toggling the concurrency of the commitlog replay, which can be achieved by lowering MAX_OUTSTANDING_REPLAY_COUNT (in our case setting this to 1 seemed to help).

Thanks,
Matt


> Let MAX_OUTSTANDING_REPLAY_COUNT be configurable
> ------------------------------------------------
>
>                 Key: CASSANDRA-7533
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7533
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jeremiah Jordan
>            Assignee: Yuki Morishita
>            Priority: Minor
>             Fix For: 2.0.10
>
>
> There are some workloads where commit log replay will run into contention issues with multiple things updating the same partition.  Through some testing it was found that lowering CommitLogReplayer.java MAX_OUTSTANDING_REPLAY_COUNT can help with this issue.
> The calculations added in CASSANDRA-6655 are one such place things get bottlenecked.



--
This message was sent by Atlassian JIRA
(v6.2#6252)