You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Sylvain Lebresne (JIRA)" <ji...@apache.org> on 2011/09/13 17:53:09 UTC

[jira] [Commented] (CASSANDRA-3178) Counter shard merging is not thread safe

    [ https://issues.apache.org/jira/browse/CASSANDRA-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103708#comment-13103708 ] 

Sylvain Lebresne commented on CASSANDRA-3178:
---------------------------------------------

Note that the removal of flush_after_mins in 1.0 is a problem for this patch. The reason is that we want to remove a shard corresponding to a NodeId for which we know no increment has been made after time t. For that removal to be safe, we must make sure that compaction includes everything that has been issued before time t. For that, current patch check that the compaction has started after time t + 2 * flush_after_mins. I'll update the patch to use the memtables creationTime instead.  

> Counter shard merging is not thread safe
> ----------------------------------------
>
>                 Key: CASSANDRA-3178
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3178
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.5
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>              Labels: counters
>             Fix For: 0.8.6
>
>         Attachments: 0001-Move-shard-merging-completely-to-compaction.patch, 0002-Simplify-improve-shard-merging-code.patch
>
>
> The first part of the counter shard merging process is done during counter replication. This was done there because it requires that all replica are made aware of the merging (we could only rely on nodetool repair for that but that seems much too fragile, it's better as just a safety net). However this part isn't thread safe as multiple threads can do the merging for the same shard at the same time (which shouldn't really "corrupt" the counter value per se, but result in an incorrect context).
> Synchronizing that part of the code would be very costly in term of performance, so instance I propose to move the part of the shard merging done during replication to compaction. It's a better place anyway. The only downside is that it means compaction will sometime send mutations to other node as a side effect, which doesn't feel very clean but is probably not a big deal either.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira